Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Disaster Recovery and High Availability with Kafka, SRM and MM2

924 visualizaciones

Publicado el

In this talk, we will present Streams Replication Manager, a new open source Kafka mirroring solution designed specifically to provide disaster recovery and high availability for Kafka. We will describe and demo various replication topologies and recovery strategies using SRM and associated tooling. Finally, we will provide an update on the ongoing work to make this engine available for the Apache Kafka community as MirrorMaker2 (KIP-382).

Publicado en: Tecnología
  • Sé el primero en comentar

Disaster Recovery and High Availability with Kafka, SRM and MM2

  1. 1. Kafka Disaster Recovery Abdelkrim Hadjidj Senior Data Streaming Specialist
  2. 2. © 2019 Cloudera, Inc. All rights reserved. 2 Quick intro • Senior Specialist Solution Engineer at Cloudera • Focus on CDF offering ● Edge Management & IoT (MiNiFi, CEM) ● Flow Management (NiFi, Registry) ● Stream Processing (Kafka, KStreams, SMM, SR, …) • Founder of Future of Data Paris Meetup http://tiny.cc/fodp • Founder of Solutions Engineers of Paris http://tiny.cc/PSE @ahadjidj
  3. 3. © 2019 Cloudera, Inc. All rights reserved. 3 Kafka Disaster Recovery options Broker Broker Broker DC1 DC2 Data DC1 DC2 Data Dual ingest Zero RPO Mirroring** Very low RPO DC2 DC3 Data Multiple DC* Zero RPO BrokerBroker Broker Broker Broker Broker Broker Broker Broker Broker Broker DC1 Broker * Stretch cluster on geographically distributed DC is not recommended ** Replication is used for internal broker replication
  4. 4. © 2019 Cloudera, Inc. All rights reserved. 4 Agenda From MM to MM2 and SRM Active Passive Architecture Active Active Architectures Other use cases Monitoring Q&A
  5. 5. © 2019 Cloudera, Inc. All rights reserved. 5 Mirror Maker use cases DC1 DC2 DC3 K1 K2 K3 MM aggregate Aggregation DC1 DC2 DC3 K1 K2 K3 MM MM Data Deployment MMK1 K2 P P P P P P C C C C C C Segmentation MMK2 K1 P P P P P P C C C C C C MMK3 P P P P P P Acquisitions & mergers
  6. 6. © 2019 Cloudera, Inc. All rights reserved. 6 Mirror Maker use cases Tracking Queuing P P P P P P P P P P P P C C C C C C C C C C C C Tracking Aggregate MM Queuing Aggregate MM C C C C C C C C C C C C HDFS HDFS MM MM
  7. 7. © 2019 Cloudera, Inc. All rights reserved. 7 Mirror Make limitations for Disaster Recovery • Static Whitelists and Blacklists • Configuration synch • Manual Topic Naming to avoid Cycles • Scalability and Throughput Limitations due to Rebalances • Lack of Monitoring and Operational Support • No Disaster Recovery, Migration, Failover • Too many MirrorMaker Clusters
  8. 8. © 2019 Cloudera, Inc. All rights reserved. 8 Streams Replication Manager • Mirror Maker 2 KIP-382 • Supports active-active, multi- cluster, cross DC replication & other complex scenarios • Leverage Kafka Connect for scalability and HA • Replicate data and configurations (ACL, partitioning, new topics, etc) • Offset translation for failover and failback • Monitoring integration with SMM A B C X Y C C C Kafka Connect MM2 cluster X topic1.part1 topic1.part0 A topic1.part1 topic1.part0 A.topic1.part1 A.topic1.part0 B topic1.part1 topic1.part0 X.topic1.part1 X.topic1.part0
  9. 9. Active – Passive Architecture
  10. 10. © 2019 Cloudera, Inc. All rights reserved. 10 Producers send to primary if available, to secondary if not Consumers can be migrated between primary and secondary clusters. Active/standby Data, offset syncs, and consumer checkpoints. Producers Producers Producers Producers Producers Consumers VIP/Load Balancers SRM Primary Cluster Secondary Cluster
  11. 11. © 2019 Cloudera, Inc. All rights reserved. 11 Configuration file • Simple file configuration • Multi directional • Fine grained replication • Topics white/black lists • Group white/black lists • Interval configurations • Supports patterns $ ./bin/connect-mirror-maker.sh mm2.properties
  12. 12. © 2019 Cloudera, Inc. All rights reserved. 12 Remote topics • Replicated topics are renamed according to ReplicationPolicy. • Default policy : <source>.<topic> • Can implement custom policies topic1 topic2 secondary.topic1 secondary.topic2 topic1 topic2 primary.topic1 primary.topic2 SRM Primary Cluster Secondary Cluster
  13. 13. © 2019 Cloudera, Inc. All rights reserved. 13 Heartbeats • MM2 emits a heartbeat topic in each source cluster, which is replicated to other clusters • Downstream cluster uses this topic to verify that ● The connector is running ● The corresponding source cluster is available target=primary source=secondary Timestamp=5434356 primary.heartbeats SRM Secondary Cluster
  14. 14. © 2019 Cloudera, Inc. All rights reserved. 14 Offset Syncs • Offset sync stream maps offsets between mirrored clusters. topic=primary.topic1 partition=4 upstreamOffset=100 downstreamOffset=102 primary.offset-syncs.internal SRM Secondary Cluster
  15. 15. © 2019 Cloudera, Inc. All rights reserved. 15 Checkpoints • Checkpoint stream replicates consumer group state. • MM2 periodically emit checkpoints in the destination cluster • The checkpoint topic is log- compacted to reflect only the latest offsets across consumer groups topic=primary.topic1 partition=4 group=consumer-group-2 upstreamOffset=100 offset=102 primary.checkpoints.internal SRM Secondary Cluster
  16. 16. © 2019 Cloudera, Inc. All rights reserved. 16 Cross-cluster offset translation Translate offsets between clusters via RemoteClusterUtils Map<TopicPartition, Long> newOffsets = RemoteClusterUtils.translateOffsets( newClusterProperties, oldClusterName, consumerGroupId); consumer.seek(newOffsets); ● offset translation based on checkpoints in new cluster ● no connection to old cluster required
  17. 17. © 2019 Cloudera, Inc. All rights reserved. 17 Publish to topic Active/standby Data, offset syncs, and consumer checkpoints. Producers Producers Producers Producers Producers Consumers VIP/Load Balancers SRM Primary Cluster Secondary Cluster Subscribe to *.topic
  18. 18. © 2019 Cloudera, Inc. All rights reserved. 18 Publish to topic Primary down: fail over Migrate consumers Data, offset syncs, and consumer checkpoints. Producers Producers Producers Producers Producers Consumers VIP/Load Balancers SRM Primary Cluster Secondary Cluster Use RemoteClusterUtil to migrate to primary.topic (old data) and topic (new data)
  19. 19. © 2019 Cloudera, Inc. All rights reserved. 19 Publish to topic Primary down: fail over Migrate consumers Data, offset syncs, and consumer checkpoints. Producers Producers Producers Producers Producers Consumers VIP/Load Balancers SRM Primary Cluster Secondary Cluster $ srm-control offsets --bootstrap-server :9092 --source primary --group foo --export > out.csv $ kafka-consumer-groups --bootstrap-server B_host:9092 --reset-offsets --group foo --execute --from-file out.csv
  20. 20. © 2019 Cloudera, Inc. All rights reserved. 20 Publish to topic Primary permanently lost? Recover from secondary. Lost primary topics can be recovered from remote topics on secondary cluster. Producers Producers Producers VIP/Load Balancers SRM Primary Cluster Secondary Cluster Primary-2 topic1 topic2 secondary.topic1 secondary.topic2 secondary.primary.topic1 secondary.primary.topic2 topic1 topic2 primary.topic1 primary.topic2 primary-2.topic1 primary-2.topic2 Data from old primary
  21. 21. Active – Passive Demo
  22. 22. © 2019 Cloudera, Inc. All rights reserved. 22 Publish to retail-store Active/standby Demo Scenario Producers Producers NiFi Producers Producers NiFi SRM Paris Cluster NYC Cluster Subscribe to retail-store and nyc_retail-store
  23. 23. Active - Active
  24. 24. © 2019 Cloudera, Inc. All rights reserved. 24 Publish to topic Active/active: Cross Consumer Groups or XDCR Consumer subscription defines the patterns Producers Producers Producers Producers Producers Consumers VIP/Load Balancers SRM Primary Cluster Secondary Cluster Produce to both cluster. Producers Producers Consumers Consume from both clusters.
  25. 25. A/ Cross-cluster consumer groups
  26. 26. © 2019 Cloudera, Inc. All rights reserved. 26 Publish to topic Cross-cluster consumer groups Effectively one big consumer group Producers Producers Producers Producers Producers Consumers VIP/Load Balancers SRM Primary Cluster Secondary Cluster Produce to both cluster. Producers Producers Consumers Subscribe to topic R1 R1 R1 Subscribe to topic R2 R2 R2
  27. 27. © 2019 Cloudera, Inc. All rights reserved. 27 Publish to topic Cross-cluster consumer groups What it takes to fail-over? Nothing Producers Producers Producers Producers Producers Consumers VIP/Load Balancers SRM Primary Cluster Secondary Cluster Produce to both cluster. Producers Producers Consumers Subscribe to topic R3 Subscribe to topic R3 R3 Primary Cluster DC temporarily lost
  28. 28. © 2019 Cloudera, Inc. All rights reserved. 28 Publish to topic Cross-cluster consumer groups What it takes to fail-back? Nothing also Producers Producers Producers Producers Producers Consumers VIP/Load Balancers SRM Primary Cluster Secondary Cluster Produce to both cluster. Producers Producers Consumers Recover from last point and resume – some events may be delayed R4 R4 R4 DC issue resolved
  29. 29. © 2019 Cloudera, Inc. All rights reserved. 29 Publish to topic Cross-cluster consumer groups DC permanently lost Producers Producers Producers Producers Producers Consumers VIP/Load Balancers SRM Primary-2 Cluster Secondary Cluster Produce to both cluster. Producers Producers Consumers Data previously in primary is not lost and can be recovered from secondary Subscribe to topic Primary Cluster Bring new DC
  30. 30. XDCR
  31. 31. © 2019 Cloudera, Inc. All rights reserved. 31 Publish to topic Cross Data Center Replication XDCR All consumers process all records Producers Producers Producers Producers Producers Consumers VIP/Load Balancers SRM Primary Cluster Secondary Cluster Produce to both cluster. Producers Producers Consumers Subscribe to *.topic R1 R1 R1 Subscribe to *.topic R1 R1 R2 R2 R2 R2 R2
  32. 32. Active – Passive Demo
  33. 33. Other use cases
  34. 34. © 2019 Cloudera, Inc. All rights reserved. 34 Cloud migration or Kafka version upgrade
  35. 35. © 2019 Cloudera, Inc. All rights reserved. 35 Aggregation for Analytics
  36. 36. Monitoring: Demo integration with SMM
  37. 37. THAN YOU

×