Everyone knows that Cassandra is a NoSQL solution for data storage. But often for processing of this data message queues are used with some existing messaging provider. Due to this, there is inconsistency of data sometimes and an additional infrastructure level to maintain. Since one of our services stores all the data in Cassandra, we have developed a solution for message queues that automatically gained a lot of useful features: scalability, high availability and flexibility. This solution I will present in the talk.
19. System components: Message
MESSAGE = REAL REQUEST DATA WITH UNIQUE ID
1ST FIELD
3rd FIELD
{ field data}
MESSAGE ID
2nd FIELD
{ field data}
{ field data}
ID FORMAT ALLOWS 4096 MESSAGES
PER MILLISECOND FROM ONE NODE
Timestamp
44 bits
Counter
12 bits
Cluster node ID
8 bits
20. System components: Batch
• Open for at least 1 second
• Closing if opened for > 10 seconds
• Closing if has > 100 messages
Ascending columns ordering
1ST MESSSAGE ID
2nd MESSAGE ID
3rd MESSAGE ID
BATCH ID { opt message data} { opt message data} { opt message data}
ID FORMAT REQUIRE BATCH TO BE OPENED FOR > 1 SECOND
Timestamp
Rounded to seconds
Cluster node ID + Batch Type
Last 3 digits
21. System components: Queue
• Similar to batch
• Unlimited
• May have batches with past time
Ascending columns ordering
1ST BATCH ID
QUEUE NAME
2nd BATCH ID
3rd BATCH ID
{ processed at }
{ processed at }
{ processed at }
22. System components: Broker
batches polling
BROKER
check batch time
process batch
PROCESSOR
QUEUE
lock batch for
processing
ZOOKEEPER
•
•
•
•
•
Natural pre-fetch thanks to batches
Easy to control messages processing
Simple concurrency model
Easy scalable between nodes
No high loading on Cassandra
26. System components: Processor
PROCESSOR
redeliver
on failure
ANOTHER BATCH
•
•
•
•
Tries to process messages as quickly as possible
On error just redeliver message
Messages are processed concurrently
Any redelivery business logic is easy to implement
27. Warnings and benefits
• Message and batch must be
checked before processing
• Hard to explain “queue” size
• Separate columns for status
tracking of message
• Perform correct compaction
from time to time
• Expected loading is handled
with single node
• Everything works on
commodity hardware
• Single storage for all data
• System is easily scalable and
reliable (no message was
lost)