How to Troubleshoot Apps for the Modern Connected Worker
Andes: a Scalable persistent Messaging System
1. Andes: a Scalable persistent
Messaging System
Charith
Wickramarachchi, Srinath
Perera, Shammi
Jayasinghe,Sanjiva Weerawarana
WSO2 Inc.
http://www.flickr.com/photos/magnusvk/334474531/
2. Outline
Dimensions of Scale
Distributed Message Brokers
Andes Architecture
Distributed Pub/sub architecture
Distributed Queues architecture
Evaluation
Conclusion
photo by John Trainoron Flickr http://www.flickr.com/photos/trainor/2902023575/, Licensed under CC
3. Message Brokers (e.g. JMS. AMQP)
• A broker sits in the middle
• Users send messages and receive them based
on interests (asynchronous)
• Publish/Subscribe (Topic)
– deliver to all
• Distributed Queues (Queue)
– deliver to one, store and deliver, persistent
4. Messaging Systems in Real World
• Event Based Systems
– Sensor Networks
– System Monitoring
• CEP (Complex Event
Processing)
• Social Networks
• Real time Analytics
• Job queues/ scheduling
http://www.flickr.com/photos/imuttoo/4257813689/ by Ian Muttoo, http://www.flickr.com/photos/eastcapital/4554220770/,
http://www.flickr.com/photos/patdavid/4619331472/ by Pat David copyright CC
5. Challenges in Modern Message
Oriented Middleware
Why? Advances in technology areas like cloud computing and
the increase of Internet based user bases demands for
scalable message oriented middleware.
Challenges
High Availability
Persistence
Scale (Dimensions of scale)
Number of messages
Number of Queues
Size of messages
Current Messaging systems
only scale in the first two
http://www.artelista.com/ypobra.php?o=19550
dimensions
6. Distributed Message Brokers
Single broker node cannot scale up
Often messaging systems scale by a network of brokers where
users can subscribe or publish (both to queues or topics) at any
node.
There are many algorithms and routing rules (e.g.
NaradaBrokering [9], Gryphon [10], Oracle Advanced Queuing
[7], TIBCO Rendezvous [8], IBM WebSphere MQ [6], and Padres
[11])
Still doing ordered delivery with queues is a challenge
7. Cassandra and Zookeeper
• Cassandra
– NoSQL Highly scalable new data model (column family)
– Highly scalable (multiple Nodes), available and no Single
Point of Failure.
– SQL like query language (from 0.8) and support search
through secondary indexes (well no JOINs, Group By etc.
..).
– Tunable consistency and replication
– Very high write throughput and good read throughput. It
is pretty fast.
• Zookeeper
– Scalable, fault tolerant distributed coordination
framework
8. Alternative Message Broker Design
• Most persistent message brokers use a per-node DB to
store messages with message routing.
• But with large messages, cost of routing messages over
the network is very high
• With availability of scalable storage and distributed
coordination middleware we propose an alternative
architecture for scalable message brokers
• Main idea
– Avoid message routing
– Use scalable storage to share messages between nodes
– Use distributed coordination to control the behavior
9. Andes – Overview
Each node polls the queues for subscriptions assigned to itself
Andes stores message content separately
Delivery logic works with messages IDs written to queue
representation in Cassandra and it only reads the messages at delivery
10. Andes – Overview (Contd.)
Users can publish or subscribe to any node, and Andes
delivers messages to all as if subscribe and publish
operations are done in the same node.
When published, each node writes the message to
Cassandra
There are nodes assigned to handle each queue/
topics, and they read messages from Cassandra and send
them to subscribers
Use Apache Zookeeper for coordination when needed
Support for AMQP JMS and WS-Eventing while enabling
interoperability between protocols
Built by extending Apache Qpid Code base
12. Publish/Subscribe design (Contd.)
• There is a queue representation implemented on
Cassandra
• Andes creates a queues for each subscription
• When a broker receives a published message, it stores
the message in a message store in Cassandra.
• Broker will write message ids of relevant messages to the
relevant subscriber queues based on the subscribed
topic.
• Each node polls Cassandra queue for subscriptions done
at that node and delivers them to the subscribers.
• Messages are deleted from Subscription queues after
acknowledgement, and Andes deletes messages from
the message store after a timeout.
13. Distributed Queues
• Strict ordering means there can be one
messages being delivered at a give time.
– Say we receive messages m1, m2 for Queue Q.
– Say we deliver messages m1 and m2 to client c1 and
c2 for Queue Q in parallel
– Say m1->c1 failed, but by then m2->c2 is done.
– If there is no other subscribers, now m1 has to be
delivered out of order.
• Two implementation
– Strict ordering support - using a distributed shared
lock with Zookeeper
– Best effort implementation
16. Test Setup
• Test 1: Comparison with other Brokers
– Single Broker Node
– Changed the size of messages with different brokers
(with 40 publishers)
– Measured the throughput from subscribers for each
case after sending 10,000 messages.
• Test 2: Scalability test
– Multiple brokers
– 20 subscribers on the same queue
– Changed the number of publishers
– Measured the throughput from subscribers for each
case after sending 10,000 messages.
17. Comparison with Other Brokers
• Andes does much better than Qpid
• Andes does better than HornetMQ for large
messages
18. Initial Scalability Results
• Adding more nodes improves throughput
• But more concurrency deteriorate the results
(need more work)
19. How does it Make a difference?
• Scale up in all 3 dimensions
• Create only one copy of message while
delivery
• High Availability and Fault Tolerance
• File transfers in pub/sub (asynchronous style)
• Let users choose between strict and best
effort messages
• Replication of stored messages in the storage
20. Conclusion and Future Work
• Provides an alternative architecture for scalable
message brokers using Cassandra and Zookeeper
• It provides
– A publish/subscribe model that does not need any
coordination between broker nodes
– A strict mode for distributed queues that provides in
order delivery
– A best-effort mode for distributed queue
• Future work
– Further Scalability Tests
– Testing with large messages
– Fault Tolerance Tests
• Available as open source project under apache
License.