Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications

400 visualizaciones

Publicado el

Adoption of open source software (OSS) at the enterprise level has flourished, as more businesses discover the considerable advantages that open source solutions hold over their proprietary counterparts, and as the enterprise mentality around open source continues to shift. We will discuss how to identify good application candidates for Apache Cassandra and Kafka as well as best practices and common pitfalls.

This presentation will also cover:
The origins of Apache Cassandra and Kafka and how these technologies have come to shape how next-gen applications are built.
Production use cases of Cassandra and Kafka: Real-time payments and buying a house (Lendi and Worldpay)
Core concepts that make the magic; Explaining the technical attributes that make your project a good fit for these technologies and the architectural patterns that make the best use of it’s capability.
Speaker: Adam Zegelin, SVP Engineering and Co-Founder, Instaclustr
As Instaclustr's founding software engineer, Adam provides the foundation knowledge of Instaclustr's capability and engineering environment. Adam is also focused on providing Instaclustr's contribution to the broader open source community on which our products and the services rely, including Apache Cassandra, Apache Spark, and other core technologies such as CoreOS and Docker. Prior to founding Instaclustr, Adam worked on large-scale big data projects with Australian Government agencies.

Publicado en: Tecnología
  • Sé el primero en comentar

Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications

  1. 1. Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications Adam Zegelin Founding Software Engineer, Instaclustr
  2. 2. 1.Xxxxxxxxx xxxxx Introduction • Adam Zegelin • Co-founded Instaclustr 5 years ago • In Canberra, Australia • Current focus is Cassandra on Kubenetes • Instaclustr • Managed Apache Cassandra, Spark and Kafka in the ☁️  AWS, GCP, Azure & IBM  3000 nodes under management  24×7×365 support • Consulting  Schema & application design  Workshops & Training • 2nd-level on-call support for on-premise deployments
  3. 3. Agenda • Introduction to Cassandra and Kafka • Real-world Use Cases • Worldpay • Lendi • Instaclustr • Partitioning: the key to scale • Fitting and architecting for your use case
  4. 4. • Linearly Scalable • Always Available • Multi-Region Data Store • Apache Cassandra is the leading NoSQL operational database for high-scale and high-reliability applications. • Shared nothing peer-to-peer architecture provides reliability up to 100% (with Instaclustr SLAs). • replicated data and multiple nodes capable of fulfilling queries  Node outage? Service just keeps running • full online maintenance and in-place upgrades • Low latency for operational applications • Sub-10ms P95 reads and writes achievable • Native active-active multi data center support • Geographic distribution (to meet latency requirements) • Disaster resilience • Workload isolation (analytics) • Cassandra is a data storage system, not an analytics/query engine or place to run logic
  5. 5. Typical Use Cases • High write to read ratio • Data is rarely updated • Including explicit deletes • The Primary Key is known at read time • Limited filtering & aggregation • No JOINs or referential integrity • Transaction logging • Time series data • IoT status and event history • Health tracker data • Order & package statuses & tracking • Weather service history • Messages and email envelopes
  6. 6. Queuing, Pub/Sub and Streaming at Scale • Apache Kafka is a distributed streaming platform • Publish and subscribe to streams of records  Similar to a message queue or EMS • Store streams of records  Fault-tolerant  Durable • Process streams of records  as they occur  randomly, any position in the stream • Replicated architecture • High-level similarities to Cassandra • Scalability • Reliability
  7. 7. Typical Use Cases • As a message bus • Loose coupling between producers and consumers • Basis for micro-services • As a commit log • A store of logical transactions • Populating analytical data stores or edge caches • As a buffer • Manage backpressure & workload spikes And when combined with Kafka Streams/Spark Streaming… • As the basis of a streaming architecture • (near) real-time analytics • Data processing pipelines
  8. 8. Typical Use Cases cont’d • Website activity tracking • Page views • Searches • Other user actions • Metrics • Operational monitoring data • Log aggregation • Centralized logging • Event sourcing • Application state changes • “we don't just want to see where we are, we also want to know how we got there”
  9. 9. Case study • Payment processor • spun out of RBS in 2010 • merged with Vantive in US in Jan 2018 for USD 10.4B to form WorldPay Inc. • Processes • >40 Million transactions per day • for 400,000 merchants • 42% of all UK non-cash transactions
  10. 10. Case study cont’d • Re-architecting of WorldPay’s XML Payment API • facilitates ~40M transactions per month • New architecture based on open source technologies • including Cassandra and Kafka • to provide scalability, availability and reduced costs • New Idempotency Service • first project to use the new architecture • provides capabilities to ensure payments are not repeated
  11. 11. Case study cont’d • Challenges • Tight deployment timeframe • Very high availability expectations • Low latency requirements • Utilises Cassandra to provide highest levels of availability and scalability • 18 node cluster • 3 AWS regions (in Europe) • Leverages Cassandras tuneable consistency  QUORUM = strong consistency across regions  still able to operate with a whole region unavailable  Latency is tolerable (restricted to EU) • Simple data model with atomic reads/writes  fits well with Cassandra capability
  12. 12. Case study cont’d • Worked with Instaclustr to accelerate development and time to stable service: • Consulting engagement assisted with data model design • Cassandra cluster run on Instaclustr managed service  production ready in weeks • Initial preference was to run on-prem • security compliance • did not expect cloud to meet latency requirements • However, timeframes did not allow establishment of internal deployment • Used Instaclustr’s managed Cassandra service on AWS for initial go-live. • Now satisfied as a long-term solution
  13. 13. Case study • Australia’s leading online home loan lender • Processing over 90% of Australia’s online lending enquiries. • Re-architecture of their platform following a major funding round • customer and data-centric
  14. 14. Case study cont’d • Integration-heavy environment • Bespoke interfaces with banks, etc. • Moving to a micro-services architecture • Kafka as a message bus • New architecture • Decoupled application code from embedded data sets from various business applications • Unified data models from the various point solutions and market segments • Enabled extensive scale  supports rapid and large growth in data as the consumer base grows
  15. 15. Case study • Cassandra • Storage for monitoring metrics & events • Custom collector • RabbitMQ transport  Will eventually move to Kafka as the transport • Metrics are processed by Riemann  Raises PagerDuty alerts, tickets, emails  Writes to Cassandra • Kafka • Centralised logging • Events are collected by fluentd • Pumped into LogStash via Kafka • Indexed via ElasticSearch • Viewed with Kibana
  16. 16. Partitioning The key to scale • Partitioning • using a key in your data to split the data across multiple servers • Manual partitioning is possible but painful • Cassandra and Kafka make partitioning transparent • needs conscious consideration
  17. 17. 1.Xxxxxxxxx xxxxx Cassandra Cluster Cluster Data Center (optional) Rack (optional, recommended) Node
  18. 18. 1.Xxxxxxxxx xxxxx Partitioning
  19. 19. Partitioning
  20. 20. Partitioning
  21. 21. 1.Xxxxxxxxx xxxxx Cassandra Partitions Queuing and Streaming at Scale
  22. 22. 1.Xxxxxxxxx xxxxxQueuing and Streaming at Scale ● Broker ○ Node/server/VM ● Topic ○ Logical grouping of data (category/feed/name) ○ Settings: ○ Replication ○ Partition count ○ Retention ○ Compaction ○ … Kafka Brokers, Topics and Partitions
  23. 23. 1.Xxxxxxxxx xxxxxQueuing and Streaming at Scale Partition ○ Subset of messages in a topic ■ Have a single master broker ■ Guarantee ordered delivery within that subset ○ Number of partitions is set on topic creation Kafka Topics and Partitions (cont’d)
  24. 24. 1.Xxxxxxxxx xxxxxQueuing and Streaming at Scale • Messages are mapped to a partition by the Producer • Randomly/round-robin • Hash of record key • Consumers are members of Consumer Groups • Consumer Groups register to consume records from Topics • Each Consumer in a Consumer Group is the exclusive consumer of a “fair share” of partitions in the topic. Kafka Partitions in Action
  25. 25. Fitting and architecting for your use case Cassandra • Big data • one or more individually big (>1TB) tables • Need to pre-determine read pattern • at least to partition key • Very low cost writes • great for high write / read ratio use cases • Ideal for small reads • 1, 10, 100, 1000 rows at a time • No limits to horizontal scaling (data size or ops/sec) • provided you can find a partition that fits. • No relational integrity • No Foreign Keys, no JOIN’s • Limited filtering, aggregation
  26. 26. Fitting and architecting for your use case Kafka • Big data • 5k+ message/topic/second • Not transactional • unlike traditional MQ tech • although guaranteed once delivery now available • Kafka Streams very powerful tool for analysis and mutations on data streams
  27. 27. Adam Zegelin adam@instaclustr.com Founding Software Engineer

×