Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Kafka streams fifth elephant 2018

Más Contenido Relacionado

Audiolibros relacionados

Gratis con una prueba de 30 días de Scribd

Ver todo

Kafka streams fifth elephant 2018

  2. 2. Kakfa Streams Architecture Active / Standby tasks Interactive Queries One Hop Queries Queryable State during Restoration Storage Policies Rack-Aware task allocation. Finite Retention in Changelog Topic Conclusion Q/A Agenda 01 02 03 04 05 06 07 08 09 10
  3. 3. Kafka Streams Architecture
  4. 4. Active / StandBy Tasks
  5. 5. Interactive Queries
  6. 6. OneHop Queries
  7. 7. • Leader is elected among Kafka Streams Application Instances using Apache Curator Leader Recipe. • Leader pushes any changes to StreamsMetadata to zookeeper • Client Watches Zookeeper Node and is notified of metadata changes. • Client builds up partition -> (Node, port) map and uses it to fetch state related to a given key. OneHop Queries Implementation Details
  8. 8. • Currently State Store is queryable only when task is in RUNNING state. • When Primary task fails, Standby is promoted to Primary. But before it can start processing messages / queries it needs to build up state by reading from local state store and changelog kafka topic. • As standby can be arbitrarily behind primary, the amount of changelog to be read can be huge. During this time state store remains non queryable. • Made changes such that store is queryable even when task is in RESTORATION (PARTITION_ASSIGNED) state. • This increases availability of micro-services built on top of Kafka Streams. Queryable State during Restoration
  9. 9. • RocksDB is used for storing state in Kafka Streams, and RocksDB works best with SSD’s. • As state gets huge, it becomes costlier to store entire data in SSD’s. • We need to move not-so-recent data to HDD’s, where it is still queryable but does not occupy space on SSD’s. • We implemented configurable storage policies, like Archival Policy, TTL policy. • Archival Policy moves data that is not touched for long time (configurable) from SSD to HDD. • On top of this user can also select to use TTL policy to completely remove state from HDD that is not modified for long time. Storage Policies
  10. 10. • In addition to actual data in store, we also store time when a particular key is modified. This information is stored in RocksDB. • Data in RocksDB is of the format <timestamp>#<key> -> key • Since RocksDB supports efficient range queries, this data allows us to find keys that are not modified for long time. • This helps us in enforcing storage policies. Storage Policies Implementation Details
  11. 11. • Currently, Kafka Streams does not support rack-awareness while allocating tasks. • This means, it is possible that both Primary and Standby tasks are allocated on the same rack. • This would result in poor fault-tolerance in case of whole rack failures. • We have a config in StreamsConfig called "RACK_ID_CONFIG" which helps StickyPartitionAssignor to assign tasks in such a way that no two replica tasks are on same rack if possible. Rack Aware Task allocation
  12. 12. • Changelog topics are log-compacted kafka topics with infinite retention time. • As amount of state of application increases, changelog topics also grow in size and infinite leads to storage consumption on kafka cluster. • To reduce space pressure on kafka cluster we implemented mechanism that allows for configurable amount of retention time in Changelog kafka topic. • When standby task fails and restarted on different machine, it tries to copy state directly from the machine on which primary task is running. And it also replays changelog kafka topic so that state is up-to-date on standby task node. Finite Retention in Changelog Topic
  13. 13. • Kafka Streams library allows application developers to write streaming applications. • Kafka Streams borrowed few ideas from Apace Samza and provided new features like Standby Tasks and Interactive Query support. • We implemented few features which helps towards improving performance(one-hop queries), availability (queryable state during RESTORATION phase), fault-tolerance(rack-aware task allocation) etc. Conclusion