SlideShare una empresa de Scribd logo
1 de 34
Descargar para leer sin conexión
A little about
Message Queues
Matt Brender
Developer Advocate at Basho
1
tweet me @mjbrender 2
tweet me @mjbrender 3
tweet me @mjbrender 4
tweet me @mjbrender
A common pattern
5
tweet me @mjbrender 6
tweet me @mjbrender 7
tweet me @mjbrender 8
tweet me @mjbrender 9
tweet me @mjbrender 10
tweet me @mjbrender 11
tweet me @mjbrender
Can’t I just…
12
tweet me @mjbrender 13
tweet me @mjbrender 14
tweet me @mjbrender 15
tweet me @mjbrender 16
tweet me @mjbrender 17
tweet me @mjbrender 18
tweet me @mjbrender 19
tweet me @mjbrender
I just want..
20
tweet me @mjbrender
Consistent Indexes
21
tweet me @mjbrender
Asychronous
22
tweet me @mjbrender 23
One Example from Kafka:
tweet me @mjbrender
The Choices
24
tweet me @mjbrender
This or That
• NoSQL
• Types
• Key/Value
• Document
• Columnar
• Graph
• “Message Queues”
• Pub/Sub
• Commit Log
25
• Hadoop
• HDFS
• Map/Reduce
• YARN
• Storm
• Real-time Stream
processing
• Spark
• Successor to
Map/Reduce
tweet me @mjbrender
Apache ActiveMQ
26
tweet me @mjbrender
RabbitMQ
27
tweet me @mjbrender
Apache Kafka
28
tweet me @mjbrender
Apache Zookeeper
29
and requires
tweet me @mjbrender
From here..
30
tweet me @mjbrender
Read, a lot
31
tweet me @mjbrender 32
Rabbit vs Kafka in detail
Exploring Message Brokers
RabbitMQ tutorials
Intro to SQS
queues.io/
tweet me @mjbrender
Get Hands-on
33
tweet me @mjbrender 34
Matt Brender
@mjbrender
Thank You

Más contenido relacionado

Más de Basho Technologies

Data Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQLData Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQLBasho Technologies
 
A Zen Journey to Database Management
A Zen Journey to Database ManagementA Zen Journey to Database Management
A Zen Journey to Database ManagementBasho Technologies
 
Vagrant up a Distributed Test Environment - Nginx Summit 2015
Vagrant up a Distributed Test Environment - Nginx Summit 2015Vagrant up a Distributed Test Environment - Nginx Summit 2015
Vagrant up a Distributed Test Environment - Nginx Summit 2015Basho Technologies
 
O'Reilly Webinar: Simplicity Scales - Big Data
O'Reilly Webinar: Simplicity Scales - Big Data O'Reilly Webinar: Simplicity Scales - Big Data
O'Reilly Webinar: Simplicity Scales - Big Data Basho Technologies
 
NoSQL Implementation - Part 1 (Velocity 2015)
NoSQL Implementation - Part 1 (Velocity 2015)NoSQL Implementation - Part 1 (Velocity 2015)
NoSQL Implementation - Part 1 (Velocity 2015)Basho Technologies
 
Coding with Riak (from Velocity 2015)
Coding with Riak (from Velocity 2015)Coding with Riak (from Velocity 2015)
Coding with Riak (from Velocity 2015)Basho Technologies
 
Basho and Riak at GOTO Stockholm: "Don't Use My Database."
Basho and Riak at GOTO Stockholm:  "Don't Use My Database."Basho and Riak at GOTO Stockholm:  "Don't Use My Database."
Basho and Riak at GOTO Stockholm: "Don't Use My Database."Basho Technologies
 
Using Basho Bench to Load Test Distributed Applications
Using Basho Bench to Load Test Distributed ApplicationsUsing Basho Bench to Load Test Distributed Applications
Using Basho Bench to Load Test Distributed ApplicationsBasho Technologies
 

Más de Basho Technologies (11)

Data Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQLData Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQL
 
A Zen Journey to Database Management
A Zen Journey to Database ManagementA Zen Journey to Database Management
A Zen Journey to Database Management
 
Vagrant up a Distributed Test Environment - Nginx Summit 2015
Vagrant up a Distributed Test Environment - Nginx Summit 2015Vagrant up a Distributed Test Environment - Nginx Summit 2015
Vagrant up a Distributed Test Environment - Nginx Summit 2015
 
O'Reilly Webinar: Simplicity Scales - Big Data
O'Reilly Webinar: Simplicity Scales - Big Data O'Reilly Webinar: Simplicity Scales - Big Data
O'Reilly Webinar: Simplicity Scales - Big Data
 
tecFinal 451 webinar deck
tecFinal 451 webinar decktecFinal 451 webinar deck
tecFinal 451 webinar deck
 
NoSQL Implementation - Part 1 (Velocity 2015)
NoSQL Implementation - Part 1 (Velocity 2015)NoSQL Implementation - Part 1 (Velocity 2015)
NoSQL Implementation - Part 1 (Velocity 2015)
 
Coding with Riak (from Velocity 2015)
Coding with Riak (from Velocity 2015)Coding with Riak (from Velocity 2015)
Coding with Riak (from Velocity 2015)
 
Relational Databases to Riak
Relational Databases to RiakRelational Databases to Riak
Relational Databases to Riak
 
Taming Big Data with NoSQL
Taming Big Data with NoSQLTaming Big Data with NoSQL
Taming Big Data with NoSQL
 
Basho and Riak at GOTO Stockholm: "Don't Use My Database."
Basho and Riak at GOTO Stockholm:  "Don't Use My Database."Basho and Riak at GOTO Stockholm:  "Don't Use My Database."
Basho and Riak at GOTO Stockholm: "Don't Use My Database."
 
Using Basho Bench to Load Test Distributed Applications
Using Basho Bench to Load Test Distributed ApplicationsUsing Basho Bench to Load Test Distributed Applications
Using Basho Bench to Load Test Distributed Applications
 

Último

Leveling Up your Branding and Mastering MERN: Fullstack WebDev
Leveling Up your Branding and Mastering MERN: Fullstack WebDevLeveling Up your Branding and Mastering MERN: Fullstack WebDev
Leveling Up your Branding and Mastering MERN: Fullstack WebDevpmgdscunsri
 
Technical improvements. Reasons. Methods. Estimations. CJ
Technical improvements.  Reasons. Methods. Estimations. CJTechnical improvements.  Reasons. Methods. Estimations. CJ
Technical improvements. Reasons. Methods. Estimations. CJpolinaucc
 
Splashtop Enterprise Brochure - Remote Computer Access and Remote Support Sof...
Splashtop Enterprise Brochure - Remote Computer Access and Remote Support Sof...Splashtop Enterprise Brochure - Remote Computer Access and Remote Support Sof...
Splashtop Enterprise Brochure - Remote Computer Access and Remote Support Sof...Splashtop Inc
 
BusinessGPT - SECURITY AND GOVERNANCE FOR GENERATIVE AI.pptx
BusinessGPT  - SECURITY AND GOVERNANCE  FOR GENERATIVE AI.pptxBusinessGPT  - SECURITY AND GOVERNANCE  FOR GENERATIVE AI.pptx
BusinessGPT - SECURITY AND GOVERNANCE FOR GENERATIVE AI.pptxAGATSoftware
 
Enterprise Content Managements Solutions
Enterprise Content Managements SolutionsEnterprise Content Managements Solutions
Enterprise Content Managements SolutionsIQBG inc
 
renewable energy renewable energy renewable energy renewable energy
renewable energy renewable energy renewable energy  renewable energyrenewable energy renewable energy renewable energy  renewable energy
renewable energy renewable energy renewable energy renewable energyjeyasrig
 
User Experience Designer | Kaylee Miller Resume
User Experience Designer | Kaylee Miller ResumeUser Experience Designer | Kaylee Miller Resume
User Experience Designer | Kaylee Miller ResumeKaylee Miller
 
If your code could speak, what would it tell you? Let GitHub Copilot Chat hel...
If your code could speak, what would it tell you? Let GitHub Copilot Chat hel...If your code could speak, what would it tell you? Let GitHub Copilot Chat hel...
If your code could speak, what would it tell you? Let GitHub Copilot Chat hel...Maxim Salnikov
 
Boost Efficiency: Sabre API Integration Made Easy
Boost Efficiency: Sabre API Integration Made EasyBoost Efficiency: Sabre API Integration Made Easy
Boost Efficiency: Sabre API Integration Made Easymichealwillson701
 
Mobile App Development process | Expert Tips
Mobile App Development process | Expert TipsMobile App Development process | Expert Tips
Mobile App Development process | Expert Tipsmichealwillson701
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
MUT4SLX: Extensions for Mutation Testing of Stateflow Models
MUT4SLX: Extensions for Mutation Testing of Stateflow ModelsMUT4SLX: Extensions for Mutation Testing of Stateflow Models
MUT4SLX: Extensions for Mutation Testing of Stateflow ModelsUniversity of Antwerp
 
Flutter the Future of Mobile App Development - 5 Crucial Reasons.pdf
Flutter the Future of Mobile App Development - 5 Crucial Reasons.pdfFlutter the Future of Mobile App Development - 5 Crucial Reasons.pdf
Flutter the Future of Mobile App Development - 5 Crucial Reasons.pdfMind IT Systems
 
openEuler Community Overview - a presentation showing the current scale
openEuler Community Overview - a presentation showing the current scaleopenEuler Community Overview - a presentation showing the current scale
openEuler Community Overview - a presentation showing the current scaleShane Coughlan
 
Take Advantage of Mx Tracking Flight Scheduling Solutions to Streamline Your ...
Take Advantage of Mx Tracking Flight Scheduling Solutions to Streamline Your ...Take Advantage of Mx Tracking Flight Scheduling Solutions to Streamline Your ...
Take Advantage of Mx Tracking Flight Scheduling Solutions to Streamline Your ...MyFAA
 
03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...
03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...
03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...jackiepotts6
 
8 key point on optimizing web hosting services in your business.pdf
8 key point on optimizing web hosting services in your business.pdf8 key point on optimizing web hosting services in your business.pdf
8 key point on optimizing web hosting services in your business.pdfOffsiteNOC
 
VuNet software organisation powerpoint deck
VuNet software organisation powerpoint deckVuNet software organisation powerpoint deck
VuNet software organisation powerpoint deckNaval Singh
 
Large Scale Architecture -- The Unreasonable Effectiveness of Simplicity
Large Scale Architecture -- The Unreasonable Effectiveness of SimplicityLarge Scale Architecture -- The Unreasonable Effectiveness of Simplicity
Large Scale Architecture -- The Unreasonable Effectiveness of SimplicityRandy Shoup
 
Practical Advice for FDA’s 510(k) Requirements.pdf
Practical Advice for FDA’s 510(k) Requirements.pdfPractical Advice for FDA’s 510(k) Requirements.pdf
Practical Advice for FDA’s 510(k) Requirements.pdfICS
 

Último (20)

Leveling Up your Branding and Mastering MERN: Fullstack WebDev
Leveling Up your Branding and Mastering MERN: Fullstack WebDevLeveling Up your Branding and Mastering MERN: Fullstack WebDev
Leveling Up your Branding and Mastering MERN: Fullstack WebDev
 
Technical improvements. Reasons. Methods. Estimations. CJ
Technical improvements.  Reasons. Methods. Estimations. CJTechnical improvements.  Reasons. Methods. Estimations. CJ
Technical improvements. Reasons. Methods. Estimations. CJ
 
Splashtop Enterprise Brochure - Remote Computer Access and Remote Support Sof...
Splashtop Enterprise Brochure - Remote Computer Access and Remote Support Sof...Splashtop Enterprise Brochure - Remote Computer Access and Remote Support Sof...
Splashtop Enterprise Brochure - Remote Computer Access and Remote Support Sof...
 
BusinessGPT - SECURITY AND GOVERNANCE FOR GENERATIVE AI.pptx
BusinessGPT  - SECURITY AND GOVERNANCE  FOR GENERATIVE AI.pptxBusinessGPT  - SECURITY AND GOVERNANCE  FOR GENERATIVE AI.pptx
BusinessGPT - SECURITY AND GOVERNANCE FOR GENERATIVE AI.pptx
 
Enterprise Content Managements Solutions
Enterprise Content Managements SolutionsEnterprise Content Managements Solutions
Enterprise Content Managements Solutions
 
renewable energy renewable energy renewable energy renewable energy
renewable energy renewable energy renewable energy  renewable energyrenewable energy renewable energy renewable energy  renewable energy
renewable energy renewable energy renewable energy renewable energy
 
User Experience Designer | Kaylee Miller Resume
User Experience Designer | Kaylee Miller ResumeUser Experience Designer | Kaylee Miller Resume
User Experience Designer | Kaylee Miller Resume
 
If your code could speak, what would it tell you? Let GitHub Copilot Chat hel...
If your code could speak, what would it tell you? Let GitHub Copilot Chat hel...If your code could speak, what would it tell you? Let GitHub Copilot Chat hel...
If your code could speak, what would it tell you? Let GitHub Copilot Chat hel...
 
Boost Efficiency: Sabre API Integration Made Easy
Boost Efficiency: Sabre API Integration Made EasyBoost Efficiency: Sabre API Integration Made Easy
Boost Efficiency: Sabre API Integration Made Easy
 
Mobile App Development process | Expert Tips
Mobile App Development process | Expert TipsMobile App Development process | Expert Tips
Mobile App Development process | Expert Tips
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
MUT4SLX: Extensions for Mutation Testing of Stateflow Models
MUT4SLX: Extensions for Mutation Testing of Stateflow ModelsMUT4SLX: Extensions for Mutation Testing of Stateflow Models
MUT4SLX: Extensions for Mutation Testing of Stateflow Models
 
Flutter the Future of Mobile App Development - 5 Crucial Reasons.pdf
Flutter the Future of Mobile App Development - 5 Crucial Reasons.pdfFlutter the Future of Mobile App Development - 5 Crucial Reasons.pdf
Flutter the Future of Mobile App Development - 5 Crucial Reasons.pdf
 
openEuler Community Overview - a presentation showing the current scale
openEuler Community Overview - a presentation showing the current scaleopenEuler Community Overview - a presentation showing the current scale
openEuler Community Overview - a presentation showing the current scale
 
Take Advantage of Mx Tracking Flight Scheduling Solutions to Streamline Your ...
Take Advantage of Mx Tracking Flight Scheduling Solutions to Streamline Your ...Take Advantage of Mx Tracking Flight Scheduling Solutions to Streamline Your ...
Take Advantage of Mx Tracking Flight Scheduling Solutions to Streamline Your ...
 
03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...
03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...
03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...
 
8 key point on optimizing web hosting services in your business.pdf
8 key point on optimizing web hosting services in your business.pdf8 key point on optimizing web hosting services in your business.pdf
8 key point on optimizing web hosting services in your business.pdf
 
VuNet software organisation powerpoint deck
VuNet software organisation powerpoint deckVuNet software organisation powerpoint deck
VuNet software organisation powerpoint deck
 
Large Scale Architecture -- The Unreasonable Effectiveness of Simplicity
Large Scale Architecture -- The Unreasonable Effectiveness of SimplicityLarge Scale Architecture -- The Unreasonable Effectiveness of Simplicity
Large Scale Architecture -- The Unreasonable Effectiveness of Simplicity
 
Practical Advice for FDA’s 510(k) Requirements.pdf
Practical Advice for FDA’s 510(k) Requirements.pdfPractical Advice for FDA’s 510(k) Requirements.pdf
Practical Advice for FDA’s 510(k) Requirements.pdf
 

A little about Message Queues - Boston Riak Meetup

Notas del editor

  1. I find this personally and professionally interesting. I’m going to make sure we’re all starting from the same assumptions by discussing common factors in the state of data management. And then we’ll work through a disturbingly common pattern that our systems end up. From this pain point, we’ll look into some of the structural considerations for your application. IDC and EMC project that data will grow to 40 zettabytes by 2020, resulting in a 50-fold growth from the beginning of 2010.[3] Computer World states that unstructured information might account for more than 70%–80% of all data in organizations.[4]
  2. I’ve implemented exactly zero of what I’m talking about. What I do offer is the good fortune of speaking to people who build these systems, basically non-stop. There is a lot to learn from just listening. I’ve spoken to hundreds of developers from companies of every shape and size. I’ve argued with ops engineers, I’ve listened to data scientists. I’ve read the 8 years of posts, from Amazon’s Dynamo paper in 2007 that Basho actually designed Riak after.
  3. And I have the good fortune to listen in to a ton of conversations. Our database at Basho, Riak, is used by many companies to store everything from session data to log aggregation. In these conversations, I always pivot to asking about their architecture - the how, the why, and the waht could be better.
  4. You’ll also see some hand drawn slides, complements of Martin Kleppmann. He gave me permissions to reuse his work after I tweeted him and I want to give back to him by letting you know about this book. Designing Data-Intensive Applications is a must read. “The buzzwords that fill this space are a sign of enthusiasm for the new possibilities, which is a great thing. However, as software engineers and architects, we also need to have a technically accurate and precise understanding of the various technologies and their trade-offs if we want to build good applications. For that understanding, we have to dig deeper than buzzwords.”
  5. What I’m going to talk about today isn’t really new — some people have known about these ideas for a long time. However, they aren’t as widely known as they should be. If you work on a non-trivial application, something with more than just one database, you’ll probably find these ideas very useful.
  6. We start with a simple web app. It has multiple clients for HTTP and native mobile. This is all successfully stored on our familiar RDBMS. And we’re successful! But success comes with more demand. Demand needs we need to speed things up. Let’s assume that you’re working on a web application. In the simplest case, it probably has the stereotypical three-tier architecture: you have some clients (which may be web browsers, or mobile apps, or both), which make requests to a web application running on your servers. The web application is where your application code or “business logic” lives.
  7. So we add a cache. We see performance improve for our users and all is well again. Then another need arrives. Perhaps you get more users, making more requests, your database gets slow, and you add a cache to speed it up – perhaps memcached or Redis, for example.
  8. We need search, which our RDBMS was not scoped to handle or does not give us the symatics we want, so we add a searching solution like Apache Solr or ElasticSearch. Perhaps you need to add full-text search to your application, and the basic search facility built into your database is not good enough, so you end up setting a separate indexing service such as Elasticsearch or Solr.
  9. Perhaps you need to move some expensive operations out of the web request flow, and into an asynchronous background process, so you add a message queue which lets you send jobs to your background workers. ActiveMQ, RabbitMQ, something home grown on top of Redis..
  10. Now that your business analytics are working, you find that your search system is no longer keeping up… but you realise that since you have all the data in HDFS anyway, you could actually build your search indexes in Hadoop and push them out to the search servers, and the system just keeps getting more and more complicated…
  11. …and the result is complete and utter insanity. We’re left with an incoherent jumble of services that all communicate with essentially the same data. Updates are terrifying because we fear the complexity we’ve relied on. How did we get into that state? How did we end up with such complexity, where everything is calling everything else, and nobody understands what is going on? It’s not that any particular decision we made along the way was bad. There is no one database or tool that can do everything that our application requires – we use the best tool for the job, and for an application with a variety of features that implies using a variety of tools. Also, as a system grows, you need a way of decomposing it into smaller components in order to keep it manageable. That’s what microservices are all about. But if your system becomes a tangled mess of interdependent components, that’s not manageable either.
  12. So how do we keep these different data systems in sync? There are a few different techniques. A popular approach is so-called dual writes:
  13. Dual writes is simple: it’s your application code’s responsibility to update data in all the right places. For example, if a user submits some data to your web app, there’s some code in the web app that first writes the data to your database, then invalidates or refreshes the appropriate cache entries, then re-indexes the document in your full-text search index, and so on. (Or maybe it does those things in parallel – doesn’t matter for our purposes.)
  14. The dual writes approach is popular because it’s easy to build, and it more or less works at first. But I’d like to argue that it’s a really bad idea, because it has some fundamental problems. The first problem is race conditions. The following diagram shows two clients making dual writes to two datastores. Time flows from left to right, following the black arrows:
  15. Here, the first client (teal) is setting the key X to be some value A. They first make a request to the first datastore – perhaps that’s the database, for example – and set X=A. The datastore responds saying the write was successful. Then the client makes a request to the second datastore – perhaps that’s the search index – and also sets X=A. At the same time as this is happening, another client (red) is also active. It wants to write to the same key X, but it wants to set the key to a different value B. The client proceeds in the same way: it first sends a request X=B to the first datastore, and then sends a request X=B to the second datastore. All these writes are successful. However, look at what value is stored in each database over time:
  16. In the first datastore, the value is first set to A by the teal client, and then set to B by the red client, so the final value is B. In the second datastore, the requests arrive in a different order: the value is first set to B, and then set to A, so the final value is A. Now the two datastores are inconsistent with each other, and they will permanently remain inconsistent until sometime later someone comes and overwrites X again.
  17. An the worst thing: we probably won’t even notice that your database and your search indexes have gone out of sync, because no errors occurred. You’ll probably only realize six months later, while you’re doing something completely different, that your database and your indexes don’t match up, and you’ll have no idea how that could have happened.
  18. In this case, the most straightforward approach is quite fundamentally flawed. We need to balance the availability of information, what queriable state it is in and whether or not we can afford the complexity.
  19. The same information in many places.
  20. The basics of all these choices is an ability to move out of the synchronous, low-latency flow of an application request.
  21. We have many choices and many angles to our balancing act to keep in mind. So let’s walk through a few that are incredibly important in the choice of your database.
  22. Message queues is a universal name for what acts as a data highway from your applications to your database services in order to keep data synchronized and avoid the insanity architecture above. NoSQL tells you nothing about what’s important. We’ll get into that further. Hadoop is actually a collection of tools, not a single solution in and of itself. The Hadoop Distributed Filesystem is a multi-server filesystem designed for high throughput and high latency. It tolerates some failure scenarios and falls into the Data Warehouse world. Unlike NoSQL, there are no low latency applications that write and then read from HDFS! That’s not what it’s intended to do. Map/Reduce is fundamentally a querying system designed for parallel computation. It’s loved for getting people off of multi-million dollar systems and allowing them to scale out and it comes at the cost of its own mapper and reducer design. Spark, another Apache project, is largely recognized as the successor to Map/Reduce. It provides a map/reduce-backwards compatability while also exposing all the data science processing available in its clients - Python and Scala. Data is pulled from disk and manuplicated in memory. YARN = framework for job parallelization All often misappropriated a the same problem set. http://java.dzone.com/articles/exploring-message-brokers
  23. Apache ActiveMQ is the most popular and powerful open source messaging and Integration Patterns server. Apache ActiveMQ is fast, supports many Cross Language Clients and Protocols, comes with easy to use Enterprise Integration Patterns and many advanced features while fully supporting JMS 1.1 and J2EE 1.4. Apache ActiveMQ is released under the Apache 2.0 License
  24. Supported by Pivotal. Robust messaging for applications Easy to use Runs on all major operating systems Supports a huge number of developer platforms Open source and commercially supported
  25. Supported by Confluent.io - founded at LinkedIn. Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. Fast A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients. Scalable Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers Durable Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact. Distributed by Design Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.
  26. ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them ,which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.
  27. Not a question so much as a challenge for you: Get Hands-on with technology, right away. Sometimes you just have to experience the system first hand to see its value. Don’t be scared of it. Whether you have the benefit of choicing an open source solution or simply need to spin up a server to test something, go use it right now. Don’t want to architecture the perfect solution because so much of what you’ll need will come from using it.