Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Apache Kafka - Martin Podval

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Próximo SlideShare
Kafka 101
Kafka 101
Cargando en…3
×

Eche un vistazo a continuación

1 de 23 Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a Apache Kafka - Martin Podval (20)

Anuncio

Más reciente (20)

Apache Kafka - Martin Podval

  1. 1. Apache Kafka @MartinPodval, hpsv.cz
  2. 2. What is Apache Kafka? Messaging System Distributed Persistent and Replicable Very fast - low latency - and scalable Simple but highly configurable By Linkedin, open sourced under apache.org
  3. 3. Data Streaming New kind of data ... ● User or application data (events) streams ● Monitoring - App, System ● App Logging ● High volume
  4. 4. Data Streaming Cont’d … you want to process ● Using various components ● Into a target form ● Map, reduce, shuffle ● Real time or batch
  5. 5. HP Service Virtualization Use Cases Process of clients message streams Real-time performance modeling Logs aggregation
  6. 6. How To Solve It? Producers and Consumers ● Distributed ● Decoupled ● Configurable ● Dynamic
  7. 7. Kafka Cluster Brokers ● = Instances, Nodes ● Topics ● Partitions ● Replicas ZK ● Coordination
  8. 8. Kafka Topics Commit Log ● Immutable ● Ordered ● Sequential Offset
  9. 9. Kafka Topics Cont’d Partitioned Independently: ● Stored ● Produced ● Consumed ⇒ Scalable Replicated ● On partition basis ● Different brokers ⇒ Fault Tolerant
  10. 10. What Can I Do? producer. write(topic_id, message); consumer. read(topic_id, offset);
  11. 11. I Want To Produce ● java/scala client ● address of one or more brokers ● choose a topic where to produce ● highly configurable and tunable: ○ partitioner ○ number of acks (async=0, master=1, replicas=1+?) ○ batching, buffer size, timeouts, retries, ...
  12. 12. I Want To Consume High Level API ● Groups abstraction ○ To All, To One ○ To Some ● Stream API ● Stores positions to support fault tolerance
  13. 13. I Want To Consume Cont’d Low Level ● Java/scala client ● Find a leader for a topic ● Calculate an offset ● Fetches messages ○ Re-consume if needed
  14. 14. I Want To Consume Cont’d Delivery Semantic: ● At most once ● At least once ● Exactly once
  15. 15. Kafka Internals - Disks Avoid: ● GC ● Random disk access
  16. 16. Kafka Internals - Disks Cont’d Disks are fast ... … when properly used ● sequential access - read ahead, write behind ● rely on operating system ○ avoid heap, materialization and GC ● it’s more like file copy over network It’s easy … with immutable topics
  17. 17. Kafka Internals - Replication “In Sync” Replicas ● Replication factor on partition basis ● One leader + 0..n replicas ● Replicas are consumers ○ “In Sync” if they are not “too far” behind a leader ○ Batch sync
  18. 18. Kafka Internals - Replication Cont’d Tunable Trade-Offs ● Producer’s write method: ○ Not blocked, async ○ Waits for master ACK ○ Waits for all in-sync replicas ● Consumer pulls only committed messages ● Server’s minimum in-sync replicas
  19. 19. Performance “Incredible” Scales with: ● clients count, message size ● number of replicas, partitions or topics Depends on network and disk throughput
  20. 20. Performance Cont’d Our testing ● 3 nodes, master + 2 replicas ● 500 000 msg/s (100 bytes[]) ● 400 mbit/s - 1.2 gbit/s network throughput ● end2end latency 2-3 ms @see http://bit.ly/1FsIR9a
  21. 21. Easy of Use ● No installation, just run a java/scala program ● Streams in files & dirs ● Transparent zookeeper ● Ecosystem
  22. 22. Cons ● Beta version ● Dependency on Zookeeper ● The way how it is written in Scala ● No easy way how to remove messages
  23. 23. Questions?

×