Más contenido relacionado La actualidad más candente (20) Similar a Kafka at Peak Performance (20) Kafka at Peak Performance4. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
Kafka At LinkedIn
1100+ Kafka brokers
Over 32,000 topics
350,000+ Partitions
875 Billion messages per day
185 Terabytes In
675 Terabytes Out
Peak Load (whole site)
– 10.5 Million messages/sec
– 18.5 Gigabits/sec Inbound
– 70.5 Gigabits/sec Outbound
4
1800+ Kafka brokers
Over 79,000 topics
1,130,000+ Partitions
1.3 Trillion messages per day
330 Terabytes In
1.2 Petabytes Out
Peak Load (single cluster)
– 2 Million messages/sec
– 4.7 Gigabits/sec Inbound
– 15 Gigabits/sec Outbound
5. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
What Will We Talk About?
Picking Your Hardware
Monitoring the Cluster
Triaging Broker Performance Problems
Conclusion
5
7. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
What’s Important To You?
Message Retention - Disk size
Message Throughput - Network capacity
Producer Performance - Disk I/O
Consumer Performance - Memory
7
8. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
Go Wide
Kafka is well-suited to horizontal scaling
RAIS - Redundant Array of Inexpensive Servers
Also helps with CPU utilization
– Kafka needs to decompress and recompress every message batch
– KIP-31 will help with this by eliminating recompression
Don’t co-locate Kafka
8
9. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
Disk Layout
RAID
– Can survive a single disk failure (not RAID 0)
– Provides the broker with a single log directory
– Eats up disk I/O
JBOD
– Gives Kafka all the disk I/O available
– Broker is not smart about balancing partitions
– If one disk fails, the entire broker stops
Amazon EBS performance works!
9
10. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
Operating System Tuning
Filesystem Options
– EXT or XFS
– Using unsafe mount options
Virtual Memory
– Swappiness
– Dirty Pages
Networking
10
11. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
Java
Only use JDK 8 now
Keep heap size small
– Even our largest brokers use a 6 GB heap
– Save the rest for page cache
Garbage Collection - G1 all the way
– Basic tuning only
– Watch for humongous allocations
11
13. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
Buy The Book!
13
Early Access available now.
Covers all aspects of Kafka,
from setup to client
development to ongoing
administration and
troubleshooting.
Also discusses stream
processing and other use
cases.
14. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
Kafka Cluster Sizing
How big for your local cluster?
– How much disk space do you have?
– How much network bandwidth do you have?
– CPU, memory, disk I/O
How big for your aggregate cluster?
– In general, multiple the number of brokers by the number of local clusters
– May have additional concerns with lots of consumers
14
15. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
Topic Configuration
Partition Counts for Local
– Many theories on how to do this correctly, but the answer is “it depends”
– How many consumers do you have?
– Do you have specific partition requirements?
– Keeping partition sizes manageable
Partition Counts for Aggregate
– Multiply the number of partitions in a local cluster by the number of local clusters
– Periodically review partition counts in all clusters
Message Retention
– If aggregate is where you really need the messages, only retain it in local for long
enough to cover mirror maker problems
15
16. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
Possible Broker Improvements
Namespaces
– Namespace topics by datacenter
– Eliminate local clusters and just have aggregate
– Significant hardware savings
JBOD Fixes
– Intelligent partition assignment
– Admin tools to move partitions between mount points
– Broker should not fail completely with a single disk failure
16
17. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
Administrative Improvements
Multiple cluster management
– Topic management across clusters
– Visualization of mirror maker paths
Better client monitoring
– Burrow for consumer monitoring
– No open source solution for producer monitoring (audit)
End-to-end availability monitoring
17
19. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
Monitoring The Foundation
CPU Load
Network inbound and outbound
Filehandle usage for Kafka
Disk
– Free space - where you write logs, and where Kafka stores messages
– Free inodes
– I/O performance - at least average wait and percent utilization
Garbage Collection
19
20. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
Broker Ground Rules
Tuning
– Stick (mostly) with the defaults
– Set default cluster retention as appropriate
– Default partition count should be at least the number of brokers
Monitoring
– Watch the right things
– Don’t try to alert on everything
Triage and Resolution
– Solve problems, don’t mask them
20
21. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
Too Much Information!
Monitoring teams hate Kafka
– Per-Topic metrics
– Per-Partition metrics
– Per-Client metrics
Capture as much as you can
– Many metrics are useful while triaging an issue
Clients want metrics on their own topics
Only alert on what is needed to signal a problem
21
22. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
Broker Monitoring
Bytes In and Out, Messages In
– Why not messages out?
Partitions
– Count and Leader Count
– Under Replicated and Offline
Threads
– Network pool, Request pool
– Max Dirty Percent
Requests
– Rates and times - total, queue, local, and send
22
23. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
Topic Monitoring
Bytes In, Bytes Out
Messages In, Produce Rate, Produce Failure Rate
Fetch Rate, Fetch Failure Rate
Partition Bytes
Log End Offset
– Why bother?
– KIP-32 will make this unnecessary
Quota Throttling
Provide this to your customers for them to alert on
23
24. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
Client Monitoring
For consumers, use Burrow
– Monitor all partitions for all consumers
– Provides an easy to digest “good, warning, bad” state, with detail available
– Fast and free
Producers are a little harder
– Several internal implementations of message auditing
– The community needs a good open source standard
Cluster availability monitoring
– kafka-monitoring is coming soon from LinkedIn!
24
26. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
All The Best Ops People…
Know more of what is happening than their customers
Are proactive
Fix bugs, not work around them
This applies to our developers too!
26
27. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
Anticipating Trouble
Trend cluster utilization and growth over time
Use default configurations for quotas and retention to require customers to
talk to you
Monitor request times
– If you are able to develop a consistent baseline, this is early warning
27
28. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
Under Replicated Partitions
Count of number of partitions which are not fully replicated within the
cluster
Also referred to as “replica lag”
Primary indicator of problems within the cluster
28
29. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
Broker Performance Checks
Are you still running 0.8?
Are all the brokers in the cluster working?
Are the network interfaces saturated?
– Reelect partition leaders
– Rebalance partitions in the cluster
– Spread out traffic more (increase partitions or brokers)
Is the CPU utilization high? (especially iowait)
– Is another process competing for resources?
– Look for a bad disk
Do you have really big messages?
29
30. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
Kafka’s OK, Now What?
If Kafka is working properly, it’s probably a client issue
– Don’t throw it over the fence. Help your customers understand
Common producer issues
– Batch size and linger time
– Receive and send buffers
– Sync vs. async, and acknowledgements
Common consumer issues
– Garbage collection problems
– Min fetch bytes and max wait time
– Not enough partitions
30
32. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
One Ecosystem
Kafka can scale to millions of messages per second, and more
– Operations must scale the cluster appropriately
– Developers must use the right tuning and go parallel
Few problems are owned by only one side
– Expanding partitions often requires coordination
– Applications that need higher reliability drive cluster configurations
Either we work together, or we fail separately
32
33. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
Would You Like To Know More?
Presentations: http://www.slideshare.net/toddpalino
– More Datacenters, More Problems
– Kafka As A Service
– Always download the originals for slide notes!
Blog Posts: https://engineering.linkedin.com/blog
– Development and SRE blogs on Kafka and other topics
LinkedIn Open Source: https://github.com/linkedin/streaming
– Burrow Consumer Monitoring - https://github.com/linkedin/Burrow
– Kafka Admin Tools - https://github.com/linkedin/kafka-tools
33
34. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
Getting Involved With Kafka
http://kafka.apache.org
Join the mailing lists
– users@kafka.apache.org
– dev@kafka.apache.org
irc.freenode.net - #apache-kafka
Meetups
– Apache Kafka - http://www.meetup.com/http-kafka-apache-org
– Bay Area Samza - http://www.meetup.com/Bay-Area-Samza-Meetup/
Contribute code
34
35. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
Data @ LinkedIn is Hiring!
Streams Infrastructure
– Kafka pub/sub ecosystem
– Stream Processing Platform built on Apache Samza
– Next Generation change capture technology (incubating)
LinkedIn
– Strong commitment to open source
– Do cool things and work with awesome people
Join us in working on cutting edge stream processing infrastructures
– Please contact kparamasivam@linkedin.com
– Software developers and Site Reliability Engineers at all levels
35
38. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
JDK Options
Heap Size -Xmx6g -Xms6g
Metaspace -XX:MetaspaceSize=96m -XX:MinMetaspaceFreeRatio=50
-XX:MaxMetaspaceFreeRatio=80
G1 Tuning -XX:+UseG1GC -XX:MaxGCPauseMillis=20
-XX:InitiatingHeapOccupancyPercent=35
-XX:G1HeapRegionSize=16M
GC Logging -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution
-XX:+PrintGCDetails -XX:+PrintGCDateStamps
-XX:+PrintTenuringDistribution
-Xloggc:/path/to/logs/gc.log -verbose:gc
Error Handling -XX:-HeapDumpOnOutOfMemoryError
-XX:ErrorFile=/path/to/logs/hs_err.log
38
39. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
OS Tuning Parameters
Networking:
net.core.rmem_default = 124928
net.core.rmem_max = 2048000
net.core.wmem_default = 124928
net.core.wmem_max = 2048000
net.ipv4.tcp_rmem = 4096 87380 4194304
net.ipv4.tcp_wmem = 4096 16384 4194304
net.ipv4.tcp_max_tw_buckets = 262144
net.ipv4.tcp_max_syn_backlog = 1024
39
40. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
OS Tuning Parameters (cont.)
Virtual Memory
vm.oom_kill_allocating_task = 1
vm.max_map_count = 200000
vm.swappiness = 1
vm.dirty_writeback_centisecs = 500
vm.dirty_expire_centisecs = 500
vm.dirty_ratio = 60
vm.dirty_background_ratio = 5
40
41. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
Kafka Broker Sensors
kafka.server:name=BytesInPerSec,type=BrokerTopicMetrics
kafka.server:name=BytesOutPerSec,type=BrokerTopicMetrics
kafka.server:name=MessagesInPerSec,type=BrokerTopicMetrics
kafka.server:name=PartitionCount,type=ReplicaManager
kafka.server:name=LeaderCount,type=ReplicaManager
kafka.server:name=UnderReplicatedPartitions,type=ReplicaManager
kafka.server:name=RequestHandlerAvgIdlePercent,type=KafkaRequestHandlerPool
kafka.controller:name=ActiveControllerCount,type=KafkaController
kafka.controller:name=OfflinePartitionsCount,type=KafkaController
kafka.log:name=max-dirty-percent,type=LogCleanerManager
kafka.network:name=NetworkProcessorAvgIdlePercent,type=SocketServer
kafka.network:name=RequestsPerSec=*,type=RequestMetrics
kafka.network:name=RequestQueueTimeMs,request=*,type=RequestMetrics
kafka.network:name=LocalTimeMs,request=*,type=RequestMetrics
kafka.network:name=RemoteTimeMs,request=*,type=RequestMetrics
kafka.network:name=ResponseQueueTimeMs,request=*,type=RequestMetrics
kafka.network:name=ResponseSendTimeMs,request=*,type=RequestMetrics
kafka.network:name=TotalTimeMs,request=*,type=RequestMetrics
41
42. SITE RELIABILITY ENGINEERING©2016 LinkedIn Corporation. All Rights Reserved.
Kafka Broker Sensors - Topics
kafka.server:name=BytesInPerSec,type=BrokerTopicMetrics,topics=*
kafka.server:name=BytesOutPerSec,type=BrokerTopicMetrics,topics=*
kafka.server:name=MessagesInPerSec,type=BrokerTopicMetrics,topics=*
kafka.server:name=TotalProduceRequestsPerSec,type=BrokerTopicMetrics,topic=*
kafka.server:name=FailedProduceRequestsPerSec,type=BrokerTopicMetrics,topic=*
kafka.server:name=TotalFetchRequestsPerSec,type=BrokerTopicMetrics,topic=*
kafka.server:name=FailedFetchRequestsPerSec,type=BrokerTopicMetrics,topic=*
kafka.log:type=Log,name=LogEndOffset,topic=*,partition=*
42