Apache Kafka - Martin Podval

•

3 recomendaciones•3,479 vistas

Apache Kafka is a distributed messaging system that allows for publishing and subscribing to streams of records, known as topics, in a fault-tolerant and scalable way. It is used for building real-time data pipelines and streaming apps. Producers write data to topics which are committed to disks across partitions and replicated for fault tolerance. Consumers read data from topics in a decoupled manner based on offsets. Kafka can process streaming data in real-time and at large volumes with low latency and high throughput.

Software

What is Apache Kafka?
Messaging System
Distributed
Persistent and Replicable
Very fast - low latency - and scalable
Simple but highly configurable
By Linkedin, open sourced under apache.org

Data Streaming
New kind of data ...
● User or application data (events) streams
● Monitoring - App, System
● App Logging
● High volume

Data Streaming Cont’d
… you want to process
● Using various components
● Into a target form
● Map, reduce, shuffle
● Real time or batch

HP Service Virtualization Use Cases
Process of clients
message streams
Real-time performance
modeling
Logs aggregation

How To Solve It?
Producers and
Consumers
● Distributed
● Decoupled
● Configurable
● Dynamic

Kafka Cluster
Brokers
● = Instances, Nodes
● Topics
● Partitions
● Replicas
ZK
● Coordination

Kafka Topics
Commit Log
● Immutable
● Ordered
● Sequential Offset

Kafka Topics Cont’d
Partitioned
Independently:
● Stored
● Produced
● Consumed
⇒ Scalable
Replicated
● On partition basis
● Different brokers
⇒ Fault Tolerant

What Can I Do?
producer.
write(topic_id, message);
consumer.
read(topic_id, offset);

I Want To Produce
● java/scala client
● address of one or more brokers
● choose a topic where to produce
● highly configurable and tunable:
○ partitioner
○ number of acks (async=0, master=1, replicas=1+?)
○ batching, buffer size, timeouts, retries, ...

I Want To Consume
High Level API
● Groups abstraction
○ To All, To One
○ To Some
● Stream API
● Stores positions to support fault tolerance

I Want To Consume Cont’d
Low Level
● Java/scala client
● Find a leader for a topic
● Calculate an offset
● Fetches messages
○ Re-consume if needed

I Want To Consume Cont’d
Delivery Semantic:
● At most once
● At least once
● Exactly once

Kafka Internals - Disks
Avoid:
● GC
● Random disk
access

Kafka Internals - Disks Cont’d
Disks are fast ...
… when properly used
● sequential access - read ahead, write behind
● rely on operating system
○ avoid heap, materialization and GC
● it’s more like file copy over network
It’s easy … with immutable topics

Kafka Internals - Replication
“In Sync” Replicas
● Replication factor on partition basis
● One leader + 0..n replicas
● Replicas are consumers
○ “In Sync” if they are not “too far” behind a leader
○ Batch sync

Kafka Internals - Replication Cont’d
Tunable Trade-Offs
● Producer’s write method:
○ Not blocked, async
○ Waits for master ACK
○ Waits for all in-sync replicas
● Consumer pulls only committed messages
● Server’s minimum in-sync replicas

Performance
“Incredible”
Scales with:
● clients count, message size
● number of replicas, partitions or topics
Depends on network and disk throughput

Performance Cont’d
Our testing
● 3 nodes, master + 2 replicas
● 500 000 msg/s (100 bytes[])
● 400 mbit/s - 1.2 gbit/s network throughput
● end2end latency 2-3 ms
@see http://bit.ly/1FsIR9a

Easy of Use
● No installation, just run a
java/scala program
● Streams in files & dirs
● Transparent zookeeper
● Ecosystem

Cons
● Beta version
● Dependency on Zookeeper
● The way how it is written in Scala
● No easy way how to remove messages

Más contenido relacionado

La actualidad más candente

An Introduction to Apache KafkaAmir Sedighi

Apache Kafka IntroductionAmita Mirajkar

Stream processing using KafkaKnoldus Inc.

Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar

Apache kafkaJemin Patel

Apache kafkaKumar Shivam

Apache kafkaSrikrishna k

Apache Kafka Fundamentals for Architects, Admins and Developersconfluent

APACHE KAFKA / Kafka Connect / Kafka StreamsKetan Gote

Kafka 101Aparna Pillai

Apache kafkaNexThoughts Technologies

Kafka: InternalsKnoldus Inc.

Apache KafkaSaroj Panyasrivanit

Apache kafkaLong Nguyen

Producer Performance Tuning for Apache KafkaJiangjie Qin

Introduction to Apache KafkaShiao-An Yuan

Improving Kafka at-least-once performance at UberYing Zheng

Apache Kafka - Messaging System OverviewDmitry Tolpeko

Apache kafkaViswanath J

Apache Kafka Architecture & Fundamentals Explainedconfluent

La actualidad más candente (20)

An Introduction to Apache Kafka

Apache Kafka Introduction

Stream processing using Kafka

Kafka Tutorial - Introduction to Apache Kafka (Part 1)

Apache kafka

Apache Kafka Fundamentals for Architects, Admins and Developers

APACHE KAFKA / Kafka Connect / Kafka Streams

Kafka 101

Apache kafka

Kafka: Internals

Apache Kafka

Apache kafka

Producer Performance Tuning for Apache Kafka

Introduction to Apache Kafka

Improving Kafka at-least-once performance at Uber

Apache Kafka - Messaging System Overview

Apache kafka

Apache Kafka Architecture & Fundamentals Explained

Similar a Apache Kafka - Martin Podval

Build real time stream processing applications using Apache KafkaHotstar

Building zero data loss pipelines with apache kafkaAvinash Ramineni

Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Monal Daxini

Insta clustr seattle kafka meetup presentation bbNitin Kumar

Stateful stream processing with kafka and samzaGeorge Li

Structured Streaming with Kafkadatamantra

kafkaAriel Moskovich

14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...Athens Big Data

Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messagesLINE Corporation

Event driven architectures with KinesisMark Harrison

Netflix Open Source Meetup Season 4 Episode 2aspyker

Tips & Tricks for Apache Kafka®confluent

Netflix Data Pipeline With KafkaSteven Wu

Netflix Data Pipeline With KafkaAllen (Xiaozhong) Wang

Twitter’s Apache Kafka Adoption Journey | Ming Liu, TwitterHostedbyConfluent

Building realtime data pipeline with Apache KafkaNagarajan Selvaraj

Activity feeds (and more) at mate1Hisham Mardam-Bey

Uber: Kafka Consumer Proxyconfluent

Introduction to apache kafkaSamuel Kerrien

Apache Kafka's Common Pitfalls & Intricacies: A Customer Support PerspectiveHostedbyConfluent

Similar a Apache Kafka - Martin Podval (20)

Build real time stream processing applications using Apache Kafka

Building zero data loss pipelines with apache kafka

Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015

Insta clustr seattle kafka meetup presentation bb

Stateful stream processing with kafka and samza

Structured Streaming with Kafka

kafka

14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...

Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages

Event driven architectures with Kinesis

Netflix Open Source Meetup Season 4 Episode 2

Tips & Tricks for Apache Kafka®

Netflix Data Pipeline With Kafka

Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter

Building realtime data pipeline with Apache Kafka

Activity feeds (and more) at mate1

Uber: Kafka Consumer Proxy

Introduction to apache kafka

Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective

Último

WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2

%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba

%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba

WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2

WSO2CON2024 - It's time to go PlatformlessWSO2

%in Benoni+277-882-255-28 abortion pills for sale in Benonimasabamasaba

Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver

%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba

%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba

OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan

%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba

WSO2CON 2024 Slides - Open Source to SaaSWSO2

WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2

%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba

What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen

Artyushina_Guest lecture_YorkU CS May 2024.pptxAnnaArtyushina1

Direct Style Effect Systems -The Print[A] Example- A Comprehension AidPhilip Schwarz

WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2

%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba

Apache Kafka - Martin Podval

1. Apache Kafka @MartinPodval, hpsv.cz

2. What is Apache Kafka? Messaging System Distributed Persistent and Replicable Very fast - low latency - and scalable Simple but highly configurable By Linkedin, open sourced under apache.org

3. Data Streaming New kind of data ... ● User or application data (events) streams ● Monitoring - App, System ● App Logging ● High volume

4. Data Streaming Cont’d … you want to process ● Using various components ● Into a target form ● Map, reduce, shuffle ● Real time or batch

5. HP Service Virtualization Use Cases Process of clients message streams Real-time performance modeling Logs aggregation

6. How To Solve It? Producers and Consumers ● Distributed ● Decoupled ● Configurable ● Dynamic

7. Kafka Cluster Brokers ● = Instances, Nodes ● Topics ● Partitions ● Replicas ZK ● Coordination

8. Kafka Topics Commit Log ● Immutable ● Ordered ● Sequential Offset

9. Kafka Topics Cont’d Partitioned Independently: ● Stored ● Produced ● Consumed ⇒ Scalable Replicated ● On partition basis ● Different brokers ⇒ Fault Tolerant

10. What Can I Do? producer. write(topic_id, message); consumer. read(topic_id, offset);

11. I Want To Produce ● java/scala client ● address of one or more brokers ● choose a topic where to produce ● highly configurable and tunable: ○ partitioner ○ number of acks (async=0, master=1, replicas=1+?) ○ batching, buffer size, timeouts, retries, ...

12. I Want To Consume High Level API ● Groups abstraction ○ To All, To One ○ To Some ● Stream API ● Stores positions to support fault tolerance

13. I Want To Consume Cont’d Low Level ● Java/scala client ● Find a leader for a topic ● Calculate an offset ● Fetches messages ○ Re-consume if needed

14. I Want To Consume Cont’d Delivery Semantic: ● At most once ● At least once ● Exactly once

15. Kafka Internals - Disks Avoid: ● GC ● Random disk access

16. Kafka Internals - Disks Cont’d Disks are fast ... … when properly used ● sequential access - read ahead, write behind ● rely on operating system ○ avoid heap, materialization and GC ● it’s more like file copy over network It’s easy … with immutable topics

17. Kafka Internals - Replication “In Sync” Replicas ● Replication factor on partition basis ● One leader + 0..n replicas ● Replicas are consumers ○ “In Sync” if they are not “too far” behind a leader ○ Batch sync

18. Kafka Internals - Replication Cont’d Tunable Trade-Offs ● Producer’s write method: ○ Not blocked, async ○ Waits for master ACK ○ Waits for all in-sync replicas ● Consumer pulls only committed messages ● Server’s minimum in-sync replicas

19. Performance “Incredible” Scales with: ● clients count, message size ● number of replicas, partitions or topics Depends on network and disk throughput

20. Performance Cont’d Our testing ● 3 nodes, master + 2 replicas ● 500 000 msg/s (100 bytes[]) ● 400 mbit/s - 1.2 gbit/s network throughput ● end2end latency 2-3 ms @see http://bit.ly/1FsIR9a

21. Easy of Use ● No installation, just run a java/scala program ● Streams in files & dirs ● Transparent zookeeper ● Ecosystem

22. Cons ● Beta version ● Dependency on Zookeeper ● The way how it is written in Scala ● No easy way how to remove messages

23. Questions?

Apache Kafka - Martin Podval

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Apache Kafka - Martin Podval

Similar a Apache Kafka - Martin Podval (20)

Último

Último (20)

Apache Kafka - Martin Podval