Automating Google Workspace (GWS) & more with Apps Script
Kafka as a message queue
1. KAFKA AS A MQ
CAN YOU DO IT, AND SHOULD YOU DO IT?
Adam Warski, Apache Kafka London Meetup
2. @adamwarski, SoftwareMill, Kafka London Meetup
THE PLAN
➤ Acknowledgments in plain Kafka
➤ Why selective acknowledgments?
➤ Why not …MQ?
➤ Kmq implementation
➤ Demo
➤ Performance
3. @adamwarski, SoftwareMill, Kafka London Meetup
➤ Offset commits:
➤ Using this, we can implement:
➤ at-least-once processing
➤ at-most-once processing
topic
msg25msg24
ACKNOWLEDGMENTS IN PLAIN KAFKA
msg18
partition 1
partition 2
partition 3
msg19 msg20 msg21 msg22 msg23
commit offset: 20
commit offset: 24
4. @adamwarski, SoftwareMill, Kafka London Meetup
WHY SELECTIVE ACKNOWLEDGMENTS?
➤ Integrating with external systems
➤ e.g. HTTP/REST endpoints
➤ email
➤ other messaging
➤ Individual calls might fail
➤ should be retried
➤ without retrying the whole batch
➤ without delaying subsequent batches
5. @adamwarski, SoftwareMill, Kafka London Meetup
WHY NOT …MQ?
➤ Typical usage scenario for a message queue
➤ RabbitMQ, ActiveMQ, Artemis, SQS …
➤ Kafka:
➤ proven & reliable clustering & replication mechanisms
➤ performance
➤ convenience: reduce operational complexity
6. @adamwarski, SoftwareMill, Kafka London Meetup
AMAZON SQS
➤ Message queue as-a-service
➤ Simple API:
➤ CreateQueue
➤ SendMessage
➤ ReceiveMessage
➤ DeleteMessage
➤ Received messages are blocked for a period of time
➤ visibility timeout
7. @adamwarski, SoftwareMill, Kafka London Meetup
KMQ: IMPLEMENTATION
➤ Two topics:
➤ queue: messages to process
➤ markers: for each message, start/end markers
➤ same number of partitions
➤ A number of queue clients
➤ here data is processed
➤ A number of redelivery trackers
8. @adamwarski, SoftwareMill, Kafka London Meetup
QUEUE CLIENT
1. Read message from queue
2. Write start [offset] to markers
➤ wait for send to complete!
3. Commit offset to queue
4. Process the message
5. Write end [offset] markers
9. markers topic
partition 1
partition 2
partition 3
queue topic
partition 1
partition 2
partition 3
msg37
4. process
message
fail processing, wait
for redelivery
msg39msg40
1. read
messages from
topic
start marker
offset: 39
2. write start
markers
msg38
3. commit
offsets
offset: 38
success, confirm
message processed
end marker
offset: 37
5. write end
markers
redelivery tracker
// started, not ended markers
offset=10, time=1488010644
offset=15, time=1488141843
offset=24, time=1488289812
…
marker
stream
every second
trigger
redeliver
timed out
messages
read & redeliver message
msg10
10. @adamwarski, SoftwareMill, Kafka London Meetup
REDELIVERY TRACKER
➤ A Kafka application
➤ consumes the markers topic
➤ Multiple instances for fail-over
➤ Uses Kafka’s auto-partition-assignment
11. @adamwarski, SoftwareMill, Kafka London Meetup
REDELIVERY TRACKER
➤ In-memory priority queue
➤ by Kafka’s marker timestamp
➤ messages with start markers, but no end markers
➤ Checks for messages to redeliver at regular intervals
➤ redelivery: seek + send
➤ in order
13. @adamwarski, SoftwareMill, Kafka London Meetup
PERFORMANCE
➤ 3-node Kafka cluster
➤ m4.2xlarge servers (8 CPUs, 32GiB RAM)
➤ single AZ
➤ 100 byte messages, sent in batches of up to 10
➤ Up to 8 sender/receiver nodes
➤ 64 to 160 partitions
➤ replication-factor=3
➤ min.insync.replicas=2
➤ acks=all (-1)
17. @adamwarski, SoftwareMill, Kafka London Meetup
KMQ INTERNALS
➤ RedeliveryTracker
➤ Implemented in Scala, with a Java API
➤ Uses Akka
➤ One tracking actor per markers topic partition
➤ One redeliver actor per queue topic partition
➤ Started/stopped when partitions are revoked/assigned
➤ KmqClient
➤ Single Java class
➤ + marker value classes
18. @adamwarski, SoftwareMill, Kafka London Meetup
ABOUT ME
➤ Software engineer, co-founder @
➤ Custom software development: Scala/Kafka/Java/Cassandra/…
➤ Open-source: sttp, QuickLens, ElasticMQ, Envers, MacWire, …
➤ Blog @ softwaremill.com/blog
➤ Twitter @ twitter.com/adamwarski
19. @adamwarski, SoftwareMill, Kafka London Meetup
SUMMARY
➤ Individual, selective message acknowledgments
➤ similar to SQS
➤ Alternative to batch/up-to-offset acknowledgments in plain Kafka
➤ Storage overhead: additional meta-data topic
➤ Performance overhead: comparable
➤ Integrating with external systems