SlideShare una empresa de Scribd logo
1 de 66
Descargar para leer sin conexión
1
Monitoring Kafka like a Pro
Xavier Léauté, Software Engineer

Gwen Shapira, Software Engineer
2
In which we’ll review:
- The basics of monitoring Kafka Brokers
- Basics of monitoring Kafka Clients
- Advanced technique of monitoring Kafka
clients
3
Apache Kafka in 3 slides
Producer Consumer
Kafka Cluster
Stream Processing Apps
Connectors Connectors
Partitions
• Kafka organizes messages into
topics

• Each topics have a set of
partitions

• Each partition is a replicated log
of messages, referenced by
sequential offset
Partition 0
Partition 1
Partition 2
0 1 2 3 4 5
0 1 2 3 4 5 6 7
0 1 2 3 4
Offset
Replication
• Each Partition is replicated 3
times

• Each replica lives on separate
broker

• Leader handles all reads and
writes.

• Followers replicate events
from leader.
01234567
Replica 1 Replica 2 Replica 3
01234567
01234567
Producer
7
Monitoring Brokers 101
Producer Consumer
Kafka Cluster
Stream Processing Apps
Connectors Connectors
9
Can your users
produce / consume?
10
Canary
● Lead partition on every broker
● Produce and Consume
● Every 15 seconds
● Yell if 4 consecutive misses
● Do this as close as possible to the
users
● Advanced: Measure latency
Partition 1

Replica 100

Leader
Partition 1

Replica 101
Broker 100 Broker 101
Partition 2

Replica 100
Partition 2

Replica 101
Leader
11
Are the brokers up?
Are metrics being reported?

Is the process up?
12
Are partitions up?
Offline partitions?
Under-replicated partitions?
Under min.isr partitions?
13
Are there enough resources?
Bandwidth?
CPU?
Diskspace?
14
Other important metrics
- Active Controller
- ZK Disconnects
- Unclean leader elections
- ISR Shrink / expand
- Network / Request idle %
- Produce/consume request total time
- Drops in throughput
15
Alerts that page
must be critical
and actionable.
16
Monitoring Clients 101
Producer Consumer
Kafka Cluster
Stream Processing Apps
Connectors Connectors
18
Famous last words…
“You just consume, and
produce. How hard can
this be?”
19
Monitor Consumer Lag
Confidential 20
Simple and elegant design
Origin
Destination
Consumer
producer
Buffer
block when 

buffer is full
Buffer
Confidential 20
Simple and elegant design
Origin
Destination
Consumer
producer
Buffer
block when 

buffer is full
io-ratio

io-wait-ratio

outgoing-byte-rate
Buffer
Confidential 20
Simple and elegant design
Origin
Destination
Consumer
producer
Buffer
block when 

buffer is full
io-ratio

io-wait-ratio

outgoing-byte-rate
batch-size-avg

batch-size-max
Buffer
Confidential 20
Simple and elegant design
Origin
Destination
Consumer
producer
Buffer
block when 

buffer is full
io-ratio

io-wait-ratio

outgoing-byte-rate
batch-size-avg

batch-size-max
record-retry-rate

record-error-rate

Buffer
Confidential 20
Simple and elegant design
Origin
Destination
Consumer
producer
Buffer
block when 

buffer is full
io-ratio

io-wait-ratio

outgoing-byte-rate
batch-size-avg

batch-size-max
record-retry-rate

record-error-rate

waiting-threads

bufferpool-wait-time
Buffer
Confidential 20
Simple and elegant design
Origin
Destination
Consumer
producer
Buffer
block when 

buffer is full
io-ratio

io-wait-ratio

outgoing-byte-rate
batch-size-avg

batch-size-max
record-retry-rate

record-error-rate

waiting-threads

bufferpool-wait-time
io-ratio

io-wait-ratio

byte-consumed-rate
Buffer
Confidential 20
Simple and elegant design
Origin
Destination
Consumer
producer
Buffer
block when 

buffer is full
io-ratio

io-wait-ratio

outgoing-byte-rate
batch-size-avg

batch-size-max
record-retry-rate

record-error-rate

waiting-threads

bufferpool-wait-time
io-ratio

io-wait-ratio

byte-consumed-rate
Buffer
fetch-size-avg

fetch-size-max

fetch-rate
Confidential 20
Simple and elegant design
Origin
Destination
Consumer
producer
Buffer
block when 

buffer is full
io-ratio

io-wait-ratio

outgoing-byte-rate
batch-size-avg

batch-size-max
record-retry-rate

record-error-rate

waiting-threads

bufferpool-wait-time
io-ratio

io-wait-ratio

byte-consumed-rate
Buffer
fetch-size-avg

fetch-size-max

fetch-rate
record-max-lag
21
Sometimes you need to dig deep.
Advanced client monitoring.
With flame graphs.
Producer Consumer
Kafka Cluster
Stream Processing Apps
Connectors Connectors
23
Why Profile Streaming Applications?
Understand your bottlenecks
Metrics are sometimes hard to come by
Metrics don’t tell the full picture – assume you know what to look for
Production is the only environment that matters
24
Profiling Streaming Applications
Most of the time is probably not spent in your application code
Deserializing / Serializing payloads becomes significant
I/O matters a lot more (state management + network)
25
My tool of choice
Async Profiler
Can profile application online, attaches to any running JVM
Support older JDK (7 and above)
No need for special JVM flags
Merges Linux perf events with JVM profiling
Low overhead
26
There are other good tools as well
Java Flight Recorder
JFR is now open-source with OpenJDK 11
Includes Mission Control powerful tools to analyze JFR dumps
Less useful for native code profiling (no perf event data)
Requires upgrading to JDK11 for most users
27
Async Profiler
git clone https://github.com/jvm-profiling-tools/async-profiler
make
./profiler.sh -d 30 -f flamegraph.svg <pid>
28
Flamegraph 101 – here’s where your CPU cycles went
% CPU Time
Stack
28
Flamegraph 101 – here’s where your CPU cycles went
GC
% CPU Time
Stack
28
Flamegraph 101 – here’s where your CPU cycles went
RocksDB
% CPU Time
Stack
28
Flamegraph 101 – here’s where your CPU cycles went
Kafka poll() loop
% CPU Time
Stack
28
Flamegraph 101 – here’s where your CPU cycles went
Actual Processing Time
% CPU Time
Stack
29
Understanding where you spend your time
Streaming applications are complex
Analyze the proportion of time spent:
• Fetching the data (including de-serializing)
• Processing the data
• Sending the data (including serializing)
CPU usage / load are poor capacity utilization metrics
Use the ratio of time spent to wall clock for capacity planning
30
Detecting Unexpected Side-Effects – Before
23% of time spent seeking in RocksDB
during cache flushes
31
Detecting Unexpected Side-Effects – After
5% of time spent in RocksDB
32
Detecting Unwanted Side-effects
Cache flushes were calling inefficient methods
Cache flushes were never a problem in testing
Production load caused streams cache to fill up instantly (always flushing)
Increased cache size gave us an order of magnitude improvement in performance
33
Helps understand I/O bottlenecks
33
Helps understand I/O bottlenecks
Time spent writing
to socket
33
Helps understand I/O bottlenecks
Time spent generating
SSL random data
Time spent writing
to socket
34
Per-Thread Profiling
Stream threads process heterogeneous workloads
Important to understand which tasks may be the bottleneck
./profiler.sh -d <time> -t f out.svg <pid>
35
Per-Thread Profiling
36
Detecting lock contention
Time spent waiting on locks / timeouts does not show up in CPU usage
./profiler.sh -d <time> -e lock -o svg=total —f out.svg <pid>
37
Detecting lock contention
37
Detecting lock contention
130ms waiting on
ConsumerCoordinator
37
Detecting lock contention
130ms waiting on
ConsumerCoordinator
225ms waiting on metric
initialization locks
38
Detecting lock contention
Detecting locking issues has helped us fix timing bugs
code was calling consumer.poll() with timeout in critical state restore loop
becomes a problem when data comes in at low rate (e.g. 1 per second)
-> state restore went from minutes to seconds
39
Allocation profiling
./profiler.sh -d <time> -e alloc -f out.svg <pid>
40
Allocation profiling
40
Allocation profiling
byte[] and ByteBuffer allocations
40
Allocation profiling
byte[] and ByteBuffer allocations
41
Allocation profiling
Addressing unnecessary memory allocation helps:
• reduce GC pressure – especially long lived objects
• improve performance – large byte arrays can be expensive to initialize
Often seen at serialization / deserialization + compression stages
-> Helped us improve performance for small messages by an order of magnitude
42
What about Containers?
43
Profiling Container Workloads
Run from inside the container
• may still require host level access
• requires shipping profiling tools with your image
Run outside container
• requires copying agent library into the container
44
Profiling on Google Kubernetes Engine
kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
my-streams-app 1/1 Running 0 72m 12.42.1.2 gke-bc5762429427-default-pool-c91f8be8-cbkq
gcloud compute ssh <node-id>
Problems
• Home is mounted non-executable
• Cannot install packages directly on host
45
GKE Toolbox
Everything needs to run inside a container
Enter the toolbox container
toolbox --bind $HOME:/host
apt-get install anything you need…
compile async-profiler or download pre-compiled into toolbox home
Note: libstd++ in target container must be compatible with the version compiled against
copy agent library to host
cp ~/async-profiler/build/libasyncProfiler.so /host/
libasyncProfiler.so
46
Setting up aync-profiler in GKE
Outside the toolbox
Copy agent library into target docker container
path in container must match exact path in toolbox (e.g /root/async-profiler/build)
docker exec -u0 <container-id> mkdir -p /root/async-profiler/build
docker cp libasyncProfiler.so <container-id>:/root/async-profiler/
build/libasyncProfiler.so
Turn on necessary kernel options (4.6 and above)
echo 1 | sudo tee /proc/sys/kernel/perf_event_paranoid
echo 0 | sudo tee /proc/sys/kernel/kptr_restrict
47
Running Aync-Profiler in GKE
Inside toolbox
~/async-profiler/profiler.sh -d 30 -f /tmp/flamegraph.svg <pid>
can’t find /tmp/flamegraph.svg ?
Output path is resolved inside the application container
Outside the toolbox
docker cp <container-id>:/tmp/flamegraph.svg ~/
Do Try This At Home
https://github.com/jvm-profiling-tools/async-profiler
https://www.confluent.io/download/
https://slackpass.io/confluentcommunity
https://www.confluent.io/blog
25% off
Kafka Summit SF 2019
code ks19meetup
Thank you!
@gwenshap

gwen@confluent.io
@xvrl

xavier@confluent.io

Más contenido relacionado

La actualidad más candente

Failing to Cross the Streams – Lessons Learned the Hard Way | Philip Schmitt,...
Failing to Cross the Streams – Lessons Learned the Hard Way | Philip Schmitt,...Failing to Cross the Streams – Lessons Learned the Hard Way | Philip Schmitt,...
Failing to Cross the Streams – Lessons Learned the Hard Way | Philip Schmitt,...HostedbyConfluent
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams APIconfluent
 
Beyond the Brokers | Emma Humber and Andrew Borley, IBM
Beyond the Brokers | Emma Humber and Andrew Borley, IBMBeyond the Brokers | Emma Humber and Andrew Borley, IBM
Beyond the Brokers | Emma Humber and Andrew Borley, IBMHostedbyConfluent
 
From Newbie to Highly Available, a Successful Kafka Adoption Tale (Jonathan S...
From Newbie to Highly Available, a Successful Kafka Adoption Tale (Jonathan S...From Newbie to Highly Available, a Successful Kafka Adoption Tale (Jonathan S...
From Newbie to Highly Available, a Successful Kafka Adoption Tale (Jonathan S...confluent
 
Understanding Apache Kafka® Latency at Scale
Understanding Apache Kafka® Latency at ScaleUnderstanding Apache Kafka® Latency at Scale
Understanding Apache Kafka® Latency at Scaleconfluent
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connectconfluent
 
Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...
Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...
Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...Natan Silnitsky
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producerconfluent
 
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka StreamsKafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streamsconfluent
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafkaconfluent
 
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming ApplicationsRunning Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming ApplicationsLightbend
 
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...HostedbyConfluent
 
Connect at Twitter-scale | Jordan Bull and Ryanne Dolan, Twitter
Connect at Twitter-scale | Jordan Bull and Ryanne Dolan, TwitterConnect at Twitter-scale | Jordan Bull and Ryanne Dolan, Twitter
Connect at Twitter-scale | Jordan Bull and Ryanne Dolan, TwitterHostedbyConfluent
 
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...HostedbyConfluent
 
Deploying Confluent Platform for Production
Deploying Confluent Platform for ProductionDeploying Confluent Platform for Production
Deploying Confluent Platform for Productionconfluent
 
Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...
Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...
Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...confluent
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETconfluent
 
Can Kafka Handle a Lyft Ride? (Andrey Falko & Can Cecen, Lyft) Kafka Summit 2020
Can Kafka Handle a Lyft Ride? (Andrey Falko & Can Cecen, Lyft) Kafka Summit 2020Can Kafka Handle a Lyft Ride? (Andrey Falko & Can Cecen, Lyft) Kafka Summit 2020
Can Kafka Handle a Lyft Ride? (Andrey Falko & Can Cecen, Lyft) Kafka Summit 2020HostedbyConfluent
 
Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020
Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020
Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020confluent
 
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...confluent
 

La actualidad más candente (20)

Failing to Cross the Streams – Lessons Learned the Hard Way | Philip Schmitt,...
Failing to Cross the Streams – Lessons Learned the Hard Way | Philip Schmitt,...Failing to Cross the Streams – Lessons Learned the Hard Way | Philip Schmitt,...
Failing to Cross the Streams – Lessons Learned the Hard Way | Philip Schmitt,...
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
 
Beyond the Brokers | Emma Humber and Andrew Borley, IBM
Beyond the Brokers | Emma Humber and Andrew Borley, IBMBeyond the Brokers | Emma Humber and Andrew Borley, IBM
Beyond the Brokers | Emma Humber and Andrew Borley, IBM
 
From Newbie to Highly Available, a Successful Kafka Adoption Tale (Jonathan S...
From Newbie to Highly Available, a Successful Kafka Adoption Tale (Jonathan S...From Newbie to Highly Available, a Successful Kafka Adoption Tale (Jonathan S...
From Newbie to Highly Available, a Successful Kafka Adoption Tale (Jonathan S...
 
Understanding Apache Kafka® Latency at Scale
Understanding Apache Kafka® Latency at ScaleUnderstanding Apache Kafka® Latency at Scale
Understanding Apache Kafka® Latency at Scale
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
 
Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...
Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...
Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
 
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka StreamsKafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
 
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming ApplicationsRunning Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
 
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
 
Connect at Twitter-scale | Jordan Bull and Ryanne Dolan, Twitter
Connect at Twitter-scale | Jordan Bull and Ryanne Dolan, TwitterConnect at Twitter-scale | Jordan Bull and Ryanne Dolan, Twitter
Connect at Twitter-scale | Jordan Bull and Ryanne Dolan, Twitter
 
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
 
Deploying Confluent Platform for Production
Deploying Confluent Platform for ProductionDeploying Confluent Platform for Production
Deploying Confluent Platform for Production
 
Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...
Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...
Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
 
Can Kafka Handle a Lyft Ride? (Andrey Falko & Can Cecen, Lyft) Kafka Summit 2020
Can Kafka Handle a Lyft Ride? (Andrey Falko & Can Cecen, Lyft) Kafka Summit 2020Can Kafka Handle a Lyft Ride? (Andrey Falko & Can Cecen, Lyft) Kafka Summit 2020
Can Kafka Handle a Lyft Ride? (Andrey Falko & Can Cecen, Lyft) Kafka Summit 2020
 
Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020
Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020
Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020
 
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
 

Similar a SFBigAnalytics_20190724: Monitor kafka like a Pro

Docker and the K computer
Docker and the K computerDocker and the K computer
Docker and the K computerPeter Bryzgalov
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaShiao-An Yuan
 
Top Ten Kafka® Configs
Top Ten Kafka® ConfigsTop Ten Kafka® Configs
Top Ten Kafka® Configsconfluent
 
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...Odinot Stanislas
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedPerformance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedTim Callaghan
 
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedGuozhang Wang
 
Cruise Control: Effortless management of Kafka clusters
Cruise Control: Effortless management of Kafka clustersCruise Control: Effortless management of Kafka clusters
Cruise Control: Effortless management of Kafka clustersPrateek Maheshwari
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexApache Apex
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingApache Apex
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelinesSumant Tambe
 
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)confluent
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)DataWorks Summit
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
The new repository in AEM 6
The new repository in AEM 6The new repository in AEM 6
The new repository in AEM 6Jukka Zitting
 
[cb22] Your Printer is not your Printer ! - Hacking Printers at Pwn2Own by A...
[cb22]  Your Printer is not your Printer ! - Hacking Printers at Pwn2Own by A...[cb22]  Your Printer is not your Printer ! - Hacking Printers at Pwn2Own by A...
[cb22] Your Printer is not your Printer ! - Hacking Printers at Pwn2Own by A...CODE BLUE
 
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica SarbuOSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica SarbuNETWAYS
 
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica SarbuOSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica SarbuNETWAYS
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningGuido Schmutz
 

Similar a SFBigAnalytics_20190724: Monitor kafka like a Pro (20)

Docker and the K computer
Docker and the K computerDocker and the K computer
Docker and the K computer
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Top Ten Kafka® Configs
Top Ten Kafka® ConfigsTop Ten Kafka® Configs
Top Ten Kafka® Configs
 
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedPerformance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons Learned
 
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
 
Cruise Control: Effortless management of Kafka clusters
Cruise Control: Effortless management of Kafka clustersCruise Control: Effortless management of Kafka clusters
Cruise Control: Effortless management of Kafka clusters
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelines
 
Versioning for Developers
Versioning for DevelopersVersioning for Developers
Versioning for Developers
 
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
The new repository in AEM 6
The new repository in AEM 6The new repository in AEM 6
The new repository in AEM 6
 
[cb22] Your Printer is not your Printer ! - Hacking Printers at Pwn2Own by A...
[cb22]  Your Printer is not your Printer ! - Hacking Printers at Pwn2Own by A...[cb22]  Your Printer is not your Printer ! - Hacking Printers at Pwn2Own by A...
[cb22] Your Printer is not your Printer ! - Hacking Printers at Pwn2Own by A...
 
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica SarbuOSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
 
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica SarbuOSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
 

Más de Chester Chen

SFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfSFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfChester Chen
 
zookeeer+raft-2.pdf
zookeeer+raft-2.pdfzookeeer+raft-2.pdf
zookeeer+raft-2.pdfChester Chen
 
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...Chester Chen
 
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...Chester Chen
 
A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?Chester Chen
 
Shopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdataShopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdataChester Chen
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...Chester Chen
 
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK... SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...Chester Chen
 
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scaleSF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scaleChester Chen
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...Chester Chen
 
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftChester Chen
 
SFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdapSFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdapChester Chen
 
Sf big analytics: bighead
Sf big analytics: bigheadSf big analytics: bighead
Sf big analytics: bigheadChester Chen
 
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformSf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformChester Chen
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Chester Chen
 
2018 data warehouse features in spark
2018   data warehouse features in spark2018   data warehouse features in spark
2018 data warehouse features in sparkChester Chen
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 Chester Chen
 
2018 02 20-jeg_index
2018 02 20-jeg_index2018 02 20-jeg_index
2018 02 20-jeg_indexChester Chen
 
Index conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathIndex conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathChester Chen
 
Index conf sparkai-feb20-n-pentreath
Index conf sparkai-feb20-n-pentreathIndex conf sparkai-feb20-n-pentreath
Index conf sparkai-feb20-n-pentreathChester Chen
 

Más de Chester Chen (20)

SFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfSFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdf
 
zookeeer+raft-2.pdf
zookeeer+raft-2.pdfzookeeer+raft-2.pdf
zookeeer+raft-2.pdf
 
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
 
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
 
A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?
 
Shopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdataShopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdata
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
 
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK... SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scaleSF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
 
SFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdapSFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdap
 
Sf big analytics: bighead
Sf big analytics: bigheadSf big analytics: bighead
Sf big analytics: bighead
 
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformSf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
 
2018 data warehouse features in spark
2018   data warehouse features in spark2018   data warehouse features in spark
2018 data warehouse features in spark
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
 
2018 02 20-jeg_index
2018 02 20-jeg_index2018 02 20-jeg_index
2018 02 20-jeg_index
 
Index conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathIndex conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreath
 
Index conf sparkai-feb20-n-pentreath
Index conf sparkai-feb20-n-pentreathIndex conf sparkai-feb20-n-pentreath
Index conf sparkai-feb20-n-pentreath
 

Último

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Último (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

SFBigAnalytics_20190724: Monitor kafka like a Pro

  • 1. 1 Monitoring Kafka like a Pro Xavier Léauté, Software Engineer
 Gwen Shapira, Software Engineer
  • 2. 2 In which we’ll review: - The basics of monitoring Kafka Brokers - Basics of monitoring Kafka Clients - Advanced technique of monitoring Kafka clients
  • 3. 3 Apache Kafka in 3 slides
  • 4. Producer Consumer Kafka Cluster Stream Processing Apps Connectors Connectors
  • 5. Partitions • Kafka organizes messages into topics • Each topics have a set of partitions • Each partition is a replicated log of messages, referenced by sequential offset Partition 0 Partition 1 Partition 2 0 1 2 3 4 5 0 1 2 3 4 5 6 7 0 1 2 3 4 Offset
  • 6. Replication • Each Partition is replicated 3 times • Each replica lives on separate broker • Leader handles all reads and writes. • Followers replicate events from leader. 01234567 Replica 1 Replica 2 Replica 3 01234567 01234567 Producer
  • 8. Producer Consumer Kafka Cluster Stream Processing Apps Connectors Connectors
  • 10. 10 Canary ● Lead partition on every broker ● Produce and Consume ● Every 15 seconds ● Yell if 4 consecutive misses ● Do this as close as possible to the users ● Advanced: Measure latency Partition 1
 Replica 100
 Leader Partition 1
 Replica 101 Broker 100 Broker 101 Partition 2
 Replica 100 Partition 2
 Replica 101 Leader
  • 11. 11 Are the brokers up? Are metrics being reported?
 Is the process up?
  • 12. 12 Are partitions up? Offline partitions? Under-replicated partitions? Under min.isr partitions?
  • 13. 13 Are there enough resources? Bandwidth? CPU? Diskspace?
  • 14. 14 Other important metrics - Active Controller - ZK Disconnects - Unclean leader elections - ISR Shrink / expand - Network / Request idle % - Produce/consume request total time - Drops in throughput
  • 15. 15 Alerts that page must be critical and actionable.
  • 17. Producer Consumer Kafka Cluster Stream Processing Apps Connectors Connectors
  • 18. 18 Famous last words… “You just consume, and produce. How hard can this be?”
  • 20. Confidential 20 Simple and elegant design Origin Destination Consumer producer Buffer block when 
 buffer is full Buffer
  • 21. Confidential 20 Simple and elegant design Origin Destination Consumer producer Buffer block when 
 buffer is full io-ratio
 io-wait-ratio
 outgoing-byte-rate Buffer
  • 22. Confidential 20 Simple and elegant design Origin Destination Consumer producer Buffer block when 
 buffer is full io-ratio
 io-wait-ratio
 outgoing-byte-rate batch-size-avg
 batch-size-max Buffer
  • 23. Confidential 20 Simple and elegant design Origin Destination Consumer producer Buffer block when 
 buffer is full io-ratio
 io-wait-ratio
 outgoing-byte-rate batch-size-avg
 batch-size-max record-retry-rate
 record-error-rate
 Buffer
  • 24. Confidential 20 Simple and elegant design Origin Destination Consumer producer Buffer block when 
 buffer is full io-ratio
 io-wait-ratio
 outgoing-byte-rate batch-size-avg
 batch-size-max record-retry-rate
 record-error-rate
 waiting-threads
 bufferpool-wait-time Buffer
  • 25. Confidential 20 Simple and elegant design Origin Destination Consumer producer Buffer block when 
 buffer is full io-ratio
 io-wait-ratio
 outgoing-byte-rate batch-size-avg
 batch-size-max record-retry-rate
 record-error-rate
 waiting-threads
 bufferpool-wait-time io-ratio
 io-wait-ratio
 byte-consumed-rate Buffer
  • 26. Confidential 20 Simple and elegant design Origin Destination Consumer producer Buffer block when 
 buffer is full io-ratio
 io-wait-ratio
 outgoing-byte-rate batch-size-avg
 batch-size-max record-retry-rate
 record-error-rate
 waiting-threads
 bufferpool-wait-time io-ratio
 io-wait-ratio
 byte-consumed-rate Buffer fetch-size-avg
 fetch-size-max
 fetch-rate
  • 27. Confidential 20 Simple and elegant design Origin Destination Consumer producer Buffer block when 
 buffer is full io-ratio
 io-wait-ratio
 outgoing-byte-rate batch-size-avg
 batch-size-max record-retry-rate
 record-error-rate
 waiting-threads
 bufferpool-wait-time io-ratio
 io-wait-ratio
 byte-consumed-rate Buffer fetch-size-avg
 fetch-size-max
 fetch-rate record-max-lag
  • 28. 21 Sometimes you need to dig deep. Advanced client monitoring. With flame graphs.
  • 29. Producer Consumer Kafka Cluster Stream Processing Apps Connectors Connectors
  • 30. 23 Why Profile Streaming Applications? Understand your bottlenecks Metrics are sometimes hard to come by Metrics don’t tell the full picture – assume you know what to look for Production is the only environment that matters
  • 31. 24 Profiling Streaming Applications Most of the time is probably not spent in your application code Deserializing / Serializing payloads becomes significant I/O matters a lot more (state management + network)
  • 32. 25 My tool of choice Async Profiler Can profile application online, attaches to any running JVM Support older JDK (7 and above) No need for special JVM flags Merges Linux perf events with JVM profiling Low overhead
  • 33. 26 There are other good tools as well Java Flight Recorder JFR is now open-source with OpenJDK 11 Includes Mission Control powerful tools to analyze JFR dumps Less useful for native code profiling (no perf event data) Requires upgrading to JDK11 for most users
  • 34. 27 Async Profiler git clone https://github.com/jvm-profiling-tools/async-profiler make ./profiler.sh -d 30 -f flamegraph.svg <pid>
  • 35. 28 Flamegraph 101 – here’s where your CPU cycles went % CPU Time Stack
  • 36. 28 Flamegraph 101 – here’s where your CPU cycles went GC % CPU Time Stack
  • 37. 28 Flamegraph 101 – here’s where your CPU cycles went RocksDB % CPU Time Stack
  • 38. 28 Flamegraph 101 – here’s where your CPU cycles went Kafka poll() loop % CPU Time Stack
  • 39. 28 Flamegraph 101 – here’s where your CPU cycles went Actual Processing Time % CPU Time Stack
  • 40. 29 Understanding where you spend your time Streaming applications are complex Analyze the proportion of time spent: • Fetching the data (including de-serializing) • Processing the data • Sending the data (including serializing) CPU usage / load are poor capacity utilization metrics Use the ratio of time spent to wall clock for capacity planning
  • 41. 30 Detecting Unexpected Side-Effects – Before 23% of time spent seeking in RocksDB during cache flushes
  • 42. 31 Detecting Unexpected Side-Effects – After 5% of time spent in RocksDB
  • 43. 32 Detecting Unwanted Side-effects Cache flushes were calling inefficient methods Cache flushes were never a problem in testing Production load caused streams cache to fill up instantly (always flushing) Increased cache size gave us an order of magnitude improvement in performance
  • 44. 33 Helps understand I/O bottlenecks
  • 45. 33 Helps understand I/O bottlenecks Time spent writing to socket
  • 46. 33 Helps understand I/O bottlenecks Time spent generating SSL random data Time spent writing to socket
  • 47. 34 Per-Thread Profiling Stream threads process heterogeneous workloads Important to understand which tasks may be the bottleneck ./profiler.sh -d <time> -t f out.svg <pid>
  • 49. 36 Detecting lock contention Time spent waiting on locks / timeouts does not show up in CPU usage ./profiler.sh -d <time> -e lock -o svg=total —f out.svg <pid>
  • 51. 37 Detecting lock contention 130ms waiting on ConsumerCoordinator
  • 52. 37 Detecting lock contention 130ms waiting on ConsumerCoordinator 225ms waiting on metric initialization locks
  • 53. 38 Detecting lock contention Detecting locking issues has helped us fix timing bugs code was calling consumer.poll() with timeout in critical state restore loop becomes a problem when data comes in at low rate (e.g. 1 per second) -> state restore went from minutes to seconds
  • 56. 40 Allocation profiling byte[] and ByteBuffer allocations
  • 57. 40 Allocation profiling byte[] and ByteBuffer allocations
  • 58. 41 Allocation profiling Addressing unnecessary memory allocation helps: • reduce GC pressure – especially long lived objects • improve performance – large byte arrays can be expensive to initialize Often seen at serialization / deserialization + compression stages -> Helped us improve performance for small messages by an order of magnitude
  • 60. 43 Profiling Container Workloads Run from inside the container • may still require host level access • requires shipping profiling tools with your image Run outside container • requires copying agent library into the container
  • 61. 44 Profiling on Google Kubernetes Engine kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE my-streams-app 1/1 Running 0 72m 12.42.1.2 gke-bc5762429427-default-pool-c91f8be8-cbkq gcloud compute ssh <node-id> Problems • Home is mounted non-executable • Cannot install packages directly on host
  • 62. 45 GKE Toolbox Everything needs to run inside a container Enter the toolbox container toolbox --bind $HOME:/host apt-get install anything you need… compile async-profiler or download pre-compiled into toolbox home Note: libstd++ in target container must be compatible with the version compiled against copy agent library to host cp ~/async-profiler/build/libasyncProfiler.so /host/ libasyncProfiler.so
  • 63. 46 Setting up aync-profiler in GKE Outside the toolbox Copy agent library into target docker container path in container must match exact path in toolbox (e.g /root/async-profiler/build) docker exec -u0 <container-id> mkdir -p /root/async-profiler/build docker cp libasyncProfiler.so <container-id>:/root/async-profiler/ build/libasyncProfiler.so Turn on necessary kernel options (4.6 and above) echo 1 | sudo tee /proc/sys/kernel/perf_event_paranoid echo 0 | sudo tee /proc/sys/kernel/kptr_restrict
  • 64. 47 Running Aync-Profiler in GKE Inside toolbox ~/async-profiler/profiler.sh -d 30 -f /tmp/flamegraph.svg <pid> can’t find /tmp/flamegraph.svg ? Output path is resolved inside the application container Outside the toolbox docker cp <container-id>:/tmp/flamegraph.svg ~/
  • 65. Do Try This At Home https://github.com/jvm-profiling-tools/async-profiler https://www.confluent.io/download/ https://slackpass.io/confluentcommunity https://www.confluent.io/blog
  • 66. 25% off Kafka Summit SF 2019 code ks19meetup Thank you! @gwenshap
 gwen@confluent.io @xvrl
 xavier@confluent.io