Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Cassandra queuing

9.012 visualizaciones

Publicado el

Publicado en: Tecnología
  • Polling for the events in Cassandra could quite easily be avoided by using a distributed cooordination service such as ZooKeeper in conjunction with Cassandra.
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí

Cassandra queuing

  1. 1. Queuing with Cassandra David Strauss [email_address] @davidstrauss
  2. 2. Why we use queues <ul><li>Loose coupling </li><ul><li>Different languages/interprocess
  3. 3. Integrate legacy and new systems
  4. 4. Allow publishers to be unaware of listeners </li></ul><li>Asynchronous requests </li><ul><li>Long-running tasks </li></ul><li>Failure tolerance </li><ul><li>Of nodes within the queue system
  5. 5. Of systems using the queue </li></ul></ul>
  6. 6. Possible queue guarantees Deliver exactly once Deliver at least once Deliver no more than once + =
  7. 7. “Enterprise” queues <ul><li>ActiveMQ </li><ul><li>Delivers at most once
  8. 8. Punts “at least once” to lower-level redundancy </li></ul><li>RabbitMQ </li><ul><li>Clustered </li><ul><li>No guarantee of “at most once”
  9. 9. Will deliver at least once </li></ul><li>Unclustered </li><ul><li>Delivers at most once, but could fail </li></ul></ul></ul>
  10. 10. Job queues <ul><li>Beanstalkd </li><ul><li>Delivers at most once
  11. 11. Can optionally persist to disk </li></ul><li>Gearman </li><ul><li>Delivers at most once
  12. 12. No persistence between restarts </li></ul></ul>
  13. 13. What's annoying about these <ul><li>Inflexible service levels </li><ul><li>Entire installation or cluster guarantees exactly the same delivery semantics for all messages
  14. 14. Not all messages are equal </li></ul><li>No scalable “at least once” queue </li><ul><li>RabbitMQ, replicates all messages to all nodes </li><ul><li>Limits scalability to what a single node can do </li></ul><li>Sending messages redundantly to multiple job queue nodes makes multiple delivery the common case
  15. 15. Application-integrated sharding doesn't count </li></ul></ul>
  16. 16. Why Cassandra? <ul><li>Processes queuing messages can use ConsistencyLevel to indicate a service level </li><ul><li>CL.ZERO is “would be nice to deliver” </li><ul><li>Same guarantee as a non-persistent queue </li></ul><li>CL.ONE is low-latency with some durability </li><ul><li>Same guarantee as a single-node persistent queue </li></ul><li>CL.QUORUM (or more) is “delivery at least once” </li><ul><li>Same guarantee as clustered persistence (e.g. Rabbit) </li></ul></ul><li>Queue is sharded/partitioned across nodes </li><ul><li>Unlike RabbitMQ </li></ul><li>Can co-locate queue with data </li></ul>
  17. 17. Queue data models for Cassandra <ul><li>Use rows as queues </li><ul><li>Best performance for ordered messages
  18. 18. Scale limited to row size (but still huge by queue standards and possible to partition) </li></ul><li>Use column families as queues with RP </li><ul><li>Distributes queue items over a cluster
  19. 19. No message ordering </li></ul><li>Use column families as queues with OPP </li><ul><li>Distributes less well over a cluster
  20. 20. Provides message ordering </li></ul></ul>
  21. 21. When you want or need “at most once” semantics <ul><li>When things are idempotent, you don't
  22. 22. When trying to avoid resource contention or redundant computation </li><ul><li>Possible to make single consumer the common case with smart consumers
  23. 23. memcached for imperfect but scalable/HA locking </li></ul><li>When something absolutely cannot happen more than once </li><ul><li>The “bank transfer” case
  24. 24. Give messages unique identity and use locking managed by consumers
  25. 25. Use a locking framework like Zookeeper
  26. 26. Audit periodically for the effects of duplication and correct
  27. 27. Maybe don't use Cassandra... </li></ul></ul>
  28. 28. What's annoying about Cassandra queues <ul><li>Polling is necessary </li><ul><li>Makes this bad for low-latency queues </li></ul><li>Adding locking requires interfacing code with multiple systems </li><ul><li>Even then, locking is usually optimistic rather than a coordinated reservation of work items </li></ul></ul>
  29. 29. So, consider Cassandra for queuing if you have... <ul><li>Different messages with different delivery importance </li><ul><li>But most messages need to reach consumers “at least once” </li></ul><li>Limited need for “at most once” guarantees
  30. 30. Too much message volume to handle throughput on a single node
  31. 31. Willingness to poll and have high latency </li></ul>