SlideShare una empresa de Scribd logo
1 de 31
Improving Kafka At-Least-Once
Performance
Ying Zheng, Streaming Data Team
Uber
Kafka at Uber
● Use cases
○ General Pub-Sub
○ Stream Processing
○ Ingestion
○ Database Changelog Transport
○ Logging
● Scale
○ Trillion+ messages / day
○ Tens of thousands of topics
Kafka at-least-once delivery at Uber
● Started using at-least-once delivery in 2016
● More and more services are using Kafka to pass business critical messages
○ E.g., payment events, driver profile management, and trip updates etc.
● Scope:
○ Hundreds of topics
○ Tens of thousands of messages per second
The performance issue
● Producer latency is important for at-least-once use cases
● Some use cases also require low end-to-end latency
● A simple performance benchmark shows that Kafka at-least-once performance
doesn’t scale well:
# partitions per node Throughput per node
(MB/s)
P50 latency (ms) P99 latency (ms)
4 105 4 27
40 64 7 27
800 64 18 58
The first at-least-once cluster at Uber
● For a dedicated at-least-once cluster, the per-node traffic was ~ 30x lower than
the other Kafka clusters
● Users were manually approved
● There are some very large use cases (hundreds thousands of messages per
second) waiting. But, the cluster could not handle that much traffic.
The performance improvement project
● Make sure at-least-once Kafka cluster can handle as much traffic as the regular
clusters
● Allow at-least-once production in regular clusters
● Some of the improvements are also useful for the non-at-least-once use cases
The benchmark
● Simulate the production traffic pattern (message size / # of partitions / QPS of each
topic, sample data etc.)
● Each broker leads ~ 900 topic-partitions
● ~ 100 K messages per broker (as leader)
● ~ 65 MB/s per broker (as leader)
● Snappy compression
● Each topic is consumed by 4 consumers
Testing cluster
11 servers with following configuration
CPU 2.40GHz 20 cores 40 hyperthreads
Memory 256GB DDR4 ECC in 4 channels
OS Linux Debian Jessie Kernel 4.4.78
JDK Oracle 1.8.0_65
RAID card 1GB write-back cache
HDD 22 disks in RAID-0, 40TB space in total
Kafka 10.0.2
Scala 2.10
The benchmark result before the optimizations
P99 latency 60 ms ~ 80 ms
P95 latency 30 ms ~ 40 ms
P50 latency 14 ms ~ 15 ms
How does at-least-once work in Kafka?
● Producers and consumers only talk to the leader broker
● Follower brokers keep fetching from the leader
Producer Consumer
Follower Broker
Leader Broker
Follower Broker
2. Fetch / 3. Response
2. Fetch / 3. Response
1. Produce /
4. Response
Improvement #1: Increase consumer fetch size
● Most consumers are now using the default fetch.min.bytes setting, which is 1
byte
● In our production environment, each topic is typically consumed by 4
consumers:
○ The actual user of the topic
○ Mirror Maker x 2 (for failover / backup)
○ Auditing
● Setting fetch.min.bytes to 64KB reduces the number of consumer fetch
requests from ~ 30K to ~5K
● The P99 latency is reduced by about 10ms, and P50 latency is reduced by
about 5ms
Improvement #2: Garbage collection
● Young gen GC happens about 1 once per second; each time, it stops-the-world
for 20ms to 30ms
● Larger young gen size helps reduce the GC overhead
● But, very large heap size (especially large old gen) has some negative impact
● When heap size < 32GB, Java uses 4-byte pointers instead of 8-byte pointers
Improvement #2: Garbage collection
From
-Xms36G -Xmx36G -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:NewRatio=2
To
-Xms22G -Xmx22G -XX:+UseG1GC -XX:NewSize=16G -XX:MaxNewSize=16G
-XX:InitiatingHeapOccupancyPercent=3
-XX:G1MixedGCCountTarget=1 -XX:G1HeapWastePercent=1
● P99 latency reduced by ~ 10ms, P50 latency reduced by ~ 0.5 to 1 ms
Improvement #3: Fix toString
● In Kafka broker code, some toString methods are declared like this:
case class FetchMetadata(...) {
override def toString =
"[minBytes: " + fetchMinBytes + ", " +
"onlyLeader:" + fetchOnlyLeader + ", "
"onlyCommitted: " + fetchOnlyCommitted + ", "
"partitionStatus: " + fetchPartitionStatus + "]"
}
● This String is only used in trace log
● This fix reduces P99 latency by a few ms
case class FetchMetadata(...) {
override def toString() = {
"[minBytes: " + fetchMinBytes + ", " +
"onlyLeader:" + fetchOnlyLeader + ", "
"onlyCommitted: " + fetchOnlyCommitted + ", "
"partitionStatus: " + fetchPartitionStatus + "]"
}
}
Improvement #4: Follower fetch protocol
Fetch Response (Version: 3) => {
throttle_time_ms => INT32
responses => [
topic => STRING
partition_responses => [
partition_header => {
partition => INT32
error_code => INT16
high_watermark => INT64
}
record_set => {RECORDS}
] * (# of partitions)
] * (# of topics)
}
Fetch Request (Version: 3) => {
replica_id => INT32
max_wait_time => INT32
min_bytes => INT32
max_bytes => INT32
topics => [
topic => STRING
partitions => [
partition => INT32
fetch_offset => INT64
max_bytes => INT32
] * (#of partitions)
] * (# of topics)
}
Improvement #4: Follower fetch protocol
● Low producer latency = high follower fetch frequency
● In each fetch request and response, all topic-partitions are repeated, even
when there is no data
● In the benchmark, each broker is following 1800 topic-partitions
Improvement #4: Follower fetch protocol
● The QPS of the production topic-partitions at Uber:
QPS range # of topic-partitions percentage
<0.01 18277 67.21
0.1 - 1 1607 5.91
1 - 10 1989 7.31
10 - 100 2230 8.20
100 - 1000 2565 9.43
> 1000 524 1.93
Improvement #4: Follower fetch protocol
● Skipping empty topic-partitions in the follower fetch response
● Reduced P99 latency from ~35ms to ~30ms
● Reduced P50 latency from 8ms to 7ms
Improvement #5: Speed up (de)serialization
● Kafka generates ~2.6KB temporary objects for each topic-partition in each
fetch cycle, while the fetch protocol only needs about 100 bytes
● bytebuffer converted into a object tree, and then converted to a HashMap
● int, long are translated into Integer, Long
Improvement #5: Speed up (de)serialization(cont.)
● The solution:
○ Avoid generating the intermediate data structure
○ Use primitive types instead of boxed types
○ Cache topic names and topic-partition objects
● P99 latency is reduced by a few ms
● P50 latency is slightly improved
Improvement #6: Fewer threads
● All the network threads and IO threads in Kafka broker share one
ArrayBlockingQueue:
● The current production setting:
○ 32 network threads
○ 32 io threads
Improvement #6: Fewer threads
● Reducing the number of threads from 32 / 32 to 16x16
● P50 latency is reduced by about 1 ms
● Further improvement may require a different threading model or a more
efficient concurrent queue (e.g. disruptor? )
Improvement #7: Lock contention in purgatory
● Purgatory is the data structure for delayed operations (e.g. produce / fetch)
● Purgatory is protected with a read-write lock shared by all the io threads
● Each second, there are tens of thousands of operations added to and removed
from Purgatory
● The solution: sharding purgatory into 256 partitions
● P50 latency is reduced by another 1 ms
Improvement #8: I/O optimizations
● The P99 latency jumps a lot after several minutes
● This happens after the memory is used up by Linux page cache
● Disk usage also increases
Improvement #8: I/O optimizations
● Most Kafka I/O operations are handled in memory by Linux page cache
● File system metadata changes are handled by the non-volatile memory on
RAID card
● Normally, Kafka threads should not be blocked by HDD I/O
● However, there are some exceptions ...
Improvement #8: I/O optimizations
● There is an optional flush
● Write operations may have to load file system metadata (e.g. inode) and the last
page of the file from disk (I’m not an Linux expert, so not 100% sure about this)
● Rolling a segment file needs dentry and inode
● The solution:
○ Turn off flush
○ Periodically touch the last 2 pages of each index and segment file
Improvement #8: I/O optimizations
● Kafka index lookup is needed in both consumer fetch and follower fetch. Normally, the followers and
consumers only look up the recent offsets. Binary search is not cache-friendly ...
Improvement #8: I/O optimizations
● More cache-friendly binary search: look up only in the warm area, if possible
The overall performance improvement
● Benchmark
○ P99 latency: ~70 ms ⇨ less than 20 ms
○ P50 latency: ~15 ms ⇨ 3-4 ms
● Production at-least-once cluster
○ P99 latency: ~10 ms ⇨ 1 ms
○ P50 latency: ~5 ms ⇨ ~0.5 ms
● Production log cluster (acks=0)
○ CPU usage: ~50% ⇨ ~16%
Ideas for further improvements
● Refer topics with an integer ID instead of string
● Register followers / consumers on leader, instead of repeating all the topic
partitions in each fetch request
● The ArrayBlockingQueue between network threads and io threads is still a
bottleneck. We should consider either a different threading model or a more
efficient alternative, like disruptor
● Save data in distributed file system instead of local disk
Proprietary and confidential © 2017 Uber Technologies, Inc. All rights reserved. No part of this
document may be reproduced or utilized in any form or by any means, electronic or mechanical,
including photocopying, recording, or by any information storage or retrieval systems, without
permission in writing from Uber. This document is intended only for the use of the individual or entity
to whom it is addressed and contains information that is privileged, confidential or otherwise exempt
from disclosure under applicable law. All recipients of this document are notified that the information
contained herein includes proprietary and confidential information of Uber, and recipient may not
make use of, disseminate, or in any way disclose this document or any of the enclosed information
to any person other than employees of addressee to the extent necessary for consultations with
authorized personnel of Uber.

Más contenido relacionado

La actualidad más candente

Cruise Control: Effortless management of Kafka clusters
Cruise Control: Effortless management of Kafka clustersCruise Control: Effortless management of Kafka clusters
Cruise Control: Effortless management of Kafka clustersPrateek Maheshwari
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안SANG WON PARK
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloudconfluent
 
Kafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersKafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersJean-Paul Azar
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practicesconfluent
 
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...HostedbyConfluent
 
How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Worksconfluent
 
Handle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaJiangjie Qin
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registryconfluent
 
Kafka at Peak Performance
Kafka at Peak PerformanceKafka at Peak Performance
Kafka at Peak PerformanceTodd Palino
 
Apache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-PatternApache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-Patternconfluent
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
 
How to tune Kafka® for production
How to tune Kafka® for productionHow to tune Kafka® for production
How to tune Kafka® for productionconfluent
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin PodvalMartin Podval
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaGuido Schmutz
 

La actualidad más candente (20)

Cruise Control: Effortless management of Kafka clusters
Cruise Control: Effortless management of Kafka clustersCruise Control: Effortless management of Kafka clusters
Cruise Control: Effortless management of Kafka clusters
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
 
kafka
kafkakafka
kafka
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
 
Kafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersKafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced Producers
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
 
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
 
How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Works
 
Handle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 
Envoy and Kafka
Envoy and KafkaEnvoy and Kafka
Envoy and Kafka
 
Kafka at Peak Performance
Kafka at Peak PerformanceKafka at Peak Performance
Kafka at Peak Performance
 
Apache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-PatternApache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-Pattern
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
How to tune Kafka® for production
How to tune Kafka® for productionHow to tune Kafka® for production
How to tune Kafka® for production
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
 

Similar a Improving Kafka at-least-once performance at Uber

Couchbase live 2016
Couchbase live 2016Couchbase live 2016
Couchbase live 2016Pierre Mavro
 
Measuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneMeasuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneOpen-NFP
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in productionPingCAP
 
Migrating to Apache Spark at Netflix
Migrating to Apache Spark at NetflixMigrating to Apache Spark at Netflix
Migrating to Apache Spark at NetflixDatabricks
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...Jorge E. López de Vergara Méndez
 
Mirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in GoMirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in Golinuxlab_conf
 
Building zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaBuilding zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaAvinash Ramineni
 
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecNetflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecPeter Bakas
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloudOVHcloud
 
Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®confluent
 
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBScott Mansfield
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
 
WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...
WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...
WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...WebCamp
 
Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringShapeBlue
 
Speedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql LoaderSpeedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql LoaderGregSmith458515
 

Similar a Improving Kafka at-least-once performance at Uber (20)

Couchbase live 2016
Couchbase live 2016Couchbase live 2016
Couchbase live 2016
 
Measuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneMeasuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data Plane
 
kafka
kafkakafka
kafka
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in production
 
Migrating to Apache Spark at Netflix
Migrating to Apache Spark at NetflixMigrating to Apache Spark at Netflix
Migrating to Apache Spark at Netflix
 
Kraken mesoscon 2018
Kraken mesoscon 2018Kraken mesoscon 2018
Kraken mesoscon 2018
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
BAXTER phase 1b
BAXTER phase 1bBAXTER phase 1b
BAXTER phase 1b
 
On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...
 
Mirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in GoMirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in Go
 
100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
 
Building zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaBuilding zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafka
 
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecNetflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloud
 
Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®
 
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...
WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...
WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...
 
Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uring
 
Speedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql LoaderSpeedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql Loader
 

Último

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456KiaraTiradoMicha
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxalwaysnagaraju26
 

Último (20)

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 

Improving Kafka at-least-once performance at Uber

  • 1. Improving Kafka At-Least-Once Performance Ying Zheng, Streaming Data Team Uber
  • 2. Kafka at Uber ● Use cases ○ General Pub-Sub ○ Stream Processing ○ Ingestion ○ Database Changelog Transport ○ Logging ● Scale ○ Trillion+ messages / day ○ Tens of thousands of topics
  • 3. Kafka at-least-once delivery at Uber ● Started using at-least-once delivery in 2016 ● More and more services are using Kafka to pass business critical messages ○ E.g., payment events, driver profile management, and trip updates etc. ● Scope: ○ Hundreds of topics ○ Tens of thousands of messages per second
  • 4. The performance issue ● Producer latency is important for at-least-once use cases ● Some use cases also require low end-to-end latency ● A simple performance benchmark shows that Kafka at-least-once performance doesn’t scale well: # partitions per node Throughput per node (MB/s) P50 latency (ms) P99 latency (ms) 4 105 4 27 40 64 7 27 800 64 18 58
  • 5. The first at-least-once cluster at Uber ● For a dedicated at-least-once cluster, the per-node traffic was ~ 30x lower than the other Kafka clusters ● Users were manually approved ● There are some very large use cases (hundreds thousands of messages per second) waiting. But, the cluster could not handle that much traffic.
  • 6. The performance improvement project ● Make sure at-least-once Kafka cluster can handle as much traffic as the regular clusters ● Allow at-least-once production in regular clusters ● Some of the improvements are also useful for the non-at-least-once use cases
  • 7. The benchmark ● Simulate the production traffic pattern (message size / # of partitions / QPS of each topic, sample data etc.) ● Each broker leads ~ 900 topic-partitions ● ~ 100 K messages per broker (as leader) ● ~ 65 MB/s per broker (as leader) ● Snappy compression ● Each topic is consumed by 4 consumers
  • 8. Testing cluster 11 servers with following configuration CPU 2.40GHz 20 cores 40 hyperthreads Memory 256GB DDR4 ECC in 4 channels OS Linux Debian Jessie Kernel 4.4.78 JDK Oracle 1.8.0_65 RAID card 1GB write-back cache HDD 22 disks in RAID-0, 40TB space in total Kafka 10.0.2 Scala 2.10
  • 9. The benchmark result before the optimizations P99 latency 60 ms ~ 80 ms P95 latency 30 ms ~ 40 ms P50 latency 14 ms ~ 15 ms
  • 10. How does at-least-once work in Kafka? ● Producers and consumers only talk to the leader broker ● Follower brokers keep fetching from the leader Producer Consumer Follower Broker Leader Broker Follower Broker 2. Fetch / 3. Response 2. Fetch / 3. Response 1. Produce / 4. Response
  • 11. Improvement #1: Increase consumer fetch size ● Most consumers are now using the default fetch.min.bytes setting, which is 1 byte ● In our production environment, each topic is typically consumed by 4 consumers: ○ The actual user of the topic ○ Mirror Maker x 2 (for failover / backup) ○ Auditing ● Setting fetch.min.bytes to 64KB reduces the number of consumer fetch requests from ~ 30K to ~5K ● The P99 latency is reduced by about 10ms, and P50 latency is reduced by about 5ms
  • 12. Improvement #2: Garbage collection ● Young gen GC happens about 1 once per second; each time, it stops-the-world for 20ms to 30ms ● Larger young gen size helps reduce the GC overhead ● But, very large heap size (especially large old gen) has some negative impact ● When heap size < 32GB, Java uses 4-byte pointers instead of 8-byte pointers
  • 13. Improvement #2: Garbage collection From -Xms36G -Xmx36G -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:NewRatio=2 To -Xms22G -Xmx22G -XX:+UseG1GC -XX:NewSize=16G -XX:MaxNewSize=16G -XX:InitiatingHeapOccupancyPercent=3 -XX:G1MixedGCCountTarget=1 -XX:G1HeapWastePercent=1 ● P99 latency reduced by ~ 10ms, P50 latency reduced by ~ 0.5 to 1 ms
  • 14. Improvement #3: Fix toString ● In Kafka broker code, some toString methods are declared like this: case class FetchMetadata(...) { override def toString = "[minBytes: " + fetchMinBytes + ", " + "onlyLeader:" + fetchOnlyLeader + ", " "onlyCommitted: " + fetchOnlyCommitted + ", " "partitionStatus: " + fetchPartitionStatus + "]" } ● This String is only used in trace log ● This fix reduces P99 latency by a few ms case class FetchMetadata(...) { override def toString() = { "[minBytes: " + fetchMinBytes + ", " + "onlyLeader:" + fetchOnlyLeader + ", " "onlyCommitted: " + fetchOnlyCommitted + ", " "partitionStatus: " + fetchPartitionStatus + "]" } }
  • 15. Improvement #4: Follower fetch protocol Fetch Response (Version: 3) => { throttle_time_ms => INT32 responses => [ topic => STRING partition_responses => [ partition_header => { partition => INT32 error_code => INT16 high_watermark => INT64 } record_set => {RECORDS} ] * (# of partitions) ] * (# of topics) } Fetch Request (Version: 3) => { replica_id => INT32 max_wait_time => INT32 min_bytes => INT32 max_bytes => INT32 topics => [ topic => STRING partitions => [ partition => INT32 fetch_offset => INT64 max_bytes => INT32 ] * (#of partitions) ] * (# of topics) }
  • 16. Improvement #4: Follower fetch protocol ● Low producer latency = high follower fetch frequency ● In each fetch request and response, all topic-partitions are repeated, even when there is no data ● In the benchmark, each broker is following 1800 topic-partitions
  • 17. Improvement #4: Follower fetch protocol ● The QPS of the production topic-partitions at Uber: QPS range # of topic-partitions percentage <0.01 18277 67.21 0.1 - 1 1607 5.91 1 - 10 1989 7.31 10 - 100 2230 8.20 100 - 1000 2565 9.43 > 1000 524 1.93
  • 18. Improvement #4: Follower fetch protocol ● Skipping empty topic-partitions in the follower fetch response ● Reduced P99 latency from ~35ms to ~30ms ● Reduced P50 latency from 8ms to 7ms
  • 19. Improvement #5: Speed up (de)serialization ● Kafka generates ~2.6KB temporary objects for each topic-partition in each fetch cycle, while the fetch protocol only needs about 100 bytes ● bytebuffer converted into a object tree, and then converted to a HashMap ● int, long are translated into Integer, Long
  • 20. Improvement #5: Speed up (de)serialization(cont.) ● The solution: ○ Avoid generating the intermediate data structure ○ Use primitive types instead of boxed types ○ Cache topic names and topic-partition objects ● P99 latency is reduced by a few ms ● P50 latency is slightly improved
  • 21. Improvement #6: Fewer threads ● All the network threads and IO threads in Kafka broker share one ArrayBlockingQueue: ● The current production setting: ○ 32 network threads ○ 32 io threads
  • 22. Improvement #6: Fewer threads ● Reducing the number of threads from 32 / 32 to 16x16 ● P50 latency is reduced by about 1 ms ● Further improvement may require a different threading model or a more efficient concurrent queue (e.g. disruptor? )
  • 23. Improvement #7: Lock contention in purgatory ● Purgatory is the data structure for delayed operations (e.g. produce / fetch) ● Purgatory is protected with a read-write lock shared by all the io threads ● Each second, there are tens of thousands of operations added to and removed from Purgatory ● The solution: sharding purgatory into 256 partitions ● P50 latency is reduced by another 1 ms
  • 24. Improvement #8: I/O optimizations ● The P99 latency jumps a lot after several minutes ● This happens after the memory is used up by Linux page cache ● Disk usage also increases
  • 25. Improvement #8: I/O optimizations ● Most Kafka I/O operations are handled in memory by Linux page cache ● File system metadata changes are handled by the non-volatile memory on RAID card ● Normally, Kafka threads should not be blocked by HDD I/O ● However, there are some exceptions ...
  • 26. Improvement #8: I/O optimizations ● There is an optional flush ● Write operations may have to load file system metadata (e.g. inode) and the last page of the file from disk (I’m not an Linux expert, so not 100% sure about this) ● Rolling a segment file needs dentry and inode ● The solution: ○ Turn off flush ○ Periodically touch the last 2 pages of each index and segment file
  • 27. Improvement #8: I/O optimizations ● Kafka index lookup is needed in both consumer fetch and follower fetch. Normally, the followers and consumers only look up the recent offsets. Binary search is not cache-friendly ...
  • 28. Improvement #8: I/O optimizations ● More cache-friendly binary search: look up only in the warm area, if possible
  • 29. The overall performance improvement ● Benchmark ○ P99 latency: ~70 ms ⇨ less than 20 ms ○ P50 latency: ~15 ms ⇨ 3-4 ms ● Production at-least-once cluster ○ P99 latency: ~10 ms ⇨ 1 ms ○ P50 latency: ~5 ms ⇨ ~0.5 ms ● Production log cluster (acks=0) ○ CPU usage: ~50% ⇨ ~16%
  • 30. Ideas for further improvements ● Refer topics with an integer ID instead of string ● Register followers / consumers on leader, instead of repeating all the topic partitions in each fetch request ● The ArrayBlockingQueue between network threads and io threads is still a bottleneck. We should consider either a different threading model or a more efficient alternative, like disruptor ● Save data in distributed file system instead of local disk
  • 31. Proprietary and confidential © 2017 Uber Technologies, Inc. All rights reserved. No part of this document may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval systems, without permission in writing from Uber. This document is intended only for the use of the individual or entity to whom it is addressed and contains information that is privileged, confidential or otherwise exempt from disclosure under applicable law. All recipients of this document are notified that the information contained herein includes proprietary and confidential information of Uber, and recipient may not make use of, disseminate, or in any way disclose this document or any of the enclosed information to any person other than employees of addressee to the extent necessary for consultations with authorized personnel of Uber.