The document discusses Toast's adoption and use of Apache Pulsar for asynchronous messaging in their microservices architecture. It describes how they built a "Pulsar Toggle" leveraging Envoy proxy to enable blue/green deployments of Pulsar consumers. The Pulsar Toggle allows consumers to be paused and resumed based on their status in the Envoy control plane, improving the reliability and usability of deploying changes to Pulsar-based services. Toast has seen increased adoption of Pulsar and benefits from its stability and scalability.
What's New in Teams Calling, Meetings and Devices March 2024
Blue-green deploys with Pulsar & Envoy
1. Pulsar Summit
San Francisco
Hotel Nikko
August 18 2022
Use Case
Blue-green deploys
with Pulsar & Envoy
in an event-driven
microservice
ecosystem
Kai Levy & Zach Walsh
Toast, Inc.
2. Kai and Zach both work on Toast’s
Scale team, building shared
infrastructure and solving problems of
messaging, routing and persistence at
scale.
Kai Levy
Senior Software Engineer
Toast
Zach Walsh
Senior Software Engineer
Toast
3. Agenda
Toast’s microservice ecosystem + Pulsar
Blue/green deployments at Toast
Driving Pulsar adoption
Our Envoy Proxy control plane
“The Pulsar Toggle”
4. We empower the restaurant
community to delight their guests,
do what they love, and thrive
8. 2018 Asynchronous messaging with RabbitMQ
● Order syncing between devices
● Change Data Capture (CDC)
A History of
Pulsar at
Toast
9. 2018 Asynchronous messaging with RabbitMQ
● Order syncing between devices
● Change Data Capture (CDC)
A History of
Pulsar at
Toast
2019 Pulsar pilot
● Initial exploration & testing
● Cluster productionalization
● First features, such as migrating change data
capture
10. Persistence & Stability
Seamless Pulsar
failover
● RabbitMQ: potential stability issues + in-memory data-storage = lost messages
○ Manual maintenance was a big burden
● Pulsar’s data replication & automatic topic balancing eliminated these concerns
11. Horizontal Scalability
broker 0
…
● Supports adding more topics without manual provisioning
● Throughput has grown more than 5x without any change in architecture
broker 1 broker 2 broker 3 broker n
12. 2018 Asynchronous messaging with RabbitMQ
● Order syncing between devices
● Change Data Capture (CDC)
A History of
Pulsar at
Toast
2019 Pulsar pilot
● Initial exploration & testing
● Cluster productionalization
● First features, such as migrating change data
capture
2020 Full-fledged adoption
● Teams across Toast rapidly built features on top of
Pulsar to help restaurants survive the pandemic
● Decorated streams built on Pulsar, which enabled
more scalable consumers
14. CDC data decorator service
notify-topic decorated-stream
Domain service
(Source of Truth)
service1
…
serviceN
Full-fledged adoption
15. Order status notifications
Delivery & curbside arrival notifications for consumers
- helping restaurants pivot to digital
Full-fledged adoption
Tip pool tracking
Tip pooling information is kept up-to-date with orders
information
Loyalty points accrual
Consumer-facing loyalty programs help Toast
restaurants thrive
Restaurant availability
Third party platforms are notified when a restaurant
goes offline
16. 2018 Asynchronous messaging with RabbitMQ
● Order syncing between devices
● Change data capture (CDC)
A History of
Pulsar at
Toast
2019 Pulsar pilot
● Initial exploration & testing
● Cluster productionalization
● First features, such as migrating change data
capture
2020 Full-fledged adoption
● Teams across Toast rapidly built features on top of
Pulsar to help restaurants survive the pandemic
● Decorated streams built on Pulsar, which enabled
more scalable consumers
2022 Next-gen order processing
● Critical replatforming projects in development will
help Toast reach the next level of scale
● Event-driven architecture being widely used for new
features
17. Agenda
Toast’s microservice ecosystem + Pulsar
Blue/green deployments at Toast
Driving Pulsar adoption
Our Envoy Proxy control plane
“The Pulsar Toggle”
20. Authentication & authorization
● Automatic service authentication provided by the client libraries
○ Easy to use with any of our supported application frameworks
● Contributed a patch into the public Java client library
21. Dead-Letter Topics
● Standards for undeliverable messages
○ Per-subscription DLQs, or automatic
acknowledgement after redelivery
○ Integrated with service configuration
22. Topic registries with terraform
● Started with in-house provider
○ Now migrating to StreamNative provider
● Lets us manage namespace authorization
● Provide defaults for retention & persistence
● Central place for discovering events
Developers write infrastructure as code
23. Metrics
● Automatically report over 2 dozen
metrics
○ Consistent across services
● Critical for operations & monitoring
● Added our own custom metrics
● Adding APM integrations
ackLatency
ackTimeouts
auto-acknowledgements
24. Message Parsing
We parse Protobuf messages into friendly Kotlin data classes
● Our open-source, Kotlin-first
protocol buffer compiler
● One-line usage for engineers
building on our client
41. Agenda
Toast’s microservice ecosystem + Pulsar
Blue/green deployments at Toast (the solution)
Driving Pulsar adoption
Our Envoy Proxy control plane
“The Pulsar Toggle”
42. Pulsar operational tooling
Elevations & deploys weren’t easy on Pulsar
REST services Pulsar consumers
Can I validate my deploy
before prod traffic?
✅ ❌
Can I validate with a small
amount of prod traffic?
✅ ❌
Can I easily roll back? ✅ ❌
Can I easily roll forward? ✅ ❌
Contrast: REST services & Pulsar (in 2019)
43. Pulsar Consumer Elevation Requirements
1. Elevate traffic to new consumers as they are set to “active” in the control plane.
2. Avoid building a single point of failure.
3. Make this reusable for other background processes at Toast.
4. No performance hit or extra infrastructure.
44. Some options we considered
Message Router Pattern
incoming topic
Deploy
N
Deploy
N + 1
Router
Control
Plane
blue topic
green topic
45. Some options we considered
Message Router Pattern - Problems
incoming topic
Deploy
N
Deploy
N + 1
Router
Control
Plane
blue topic
green topic
● But, the router is a single
point of failure
● More infrastructure to
monitor
● Two hops per message
46. Some options we considered
Feature Flags
● Apps use a feature flag to
know whether to connect
● But, not integrated with our
control plane
● Requires more setup for
each consumer
incoming topic
Deploy
N
Deploy
N + 1
FF Off
FF On
47. Some options we considered
Pausing Inactive Consumers
● The Feature Flag approach
is close
○ No extra infrastructure
○ No extra hops
● But, we’d need to integrate
it into our control plane
● Is this possible with Pulsar?
incoming topic
Deploy
N
Deploy
N + 1
inactive
active
48. Let’s see what the Pulsar source code has to say about pausing consumers.
What does Pulsar provide?
In Consumer.java:
49. Will pause() and resume() work?
Pulsar consumers Pulsar consumers with
pause()
Can I validate my deploy
before prod traffic?
❌ ✅
Can I validate with a small
amount of prod traffic?
❌ ❌
Can I easily roll back? ❌ ✅
Can I easily roll forward? ❌ ✅
What do operations look like if inactive consumers call pause()?
50. How do we get each consumer to call pause() or resume() at the right time?
How Would You Solve This?
● Pausing pulsar consumers is
easy. Knowing when to pause is
hard.
● Central control plane component
owns this data
● Let’s just poll that service
● What would that look like?
control plane
service Z
51. What’s Wrong With This?
● Used to be the pattern for
service discovery at Toast
● Subject to thundering herd
● Now, we leverage Envoy
control plane
service Z
52. Agenda
Toast’s microservice ecosystem + Pulsar
Blue/green deployments at Toast
Driving Pulsar adoption
Our Envoy Proxy control plane
“The Pulsar Toggle”
54. Envoy is a reverse proxy
Deployed as a sidecar, forwards requests to their destination
Envoy acts as a proxy, forwarding requests upstream.
my-service menus
GET /menus/v2/menuItems GET /v2/menuItems
envoy
55. Envoy is eventually consistent
Routing changes are pushed asynchronously
Envoy sidecars across the fleet are pushed updates within ~1-2min of the
change.
Control Plane
…
56. Envoy knows service status
It gets a push each time any deploy goes active or inactive
We can leverage this to pause() or resume() consumers.
57. Envoy direct responses
Using an interesting Envoy feature to avoid single points of failure
It can intercept requests and reply with a direct response! This gets
the status info into the process where the Consumer is running.
*magic config*
GET /sidecar/v1/elevation/active
{ "active": true }
my-service envoy
58. Agenda
Toast’s microservice ecosystem + Pulsar
Blue/green deployments at Toast
Driving Pulsar adoption
Our Envoy Proxy control plane
“The Pulsar Toggle”
60. “Pulsar Toggle” implementation
Leveraging our Envoy Control Plane to toggle Pulsar consumers
A thread polls the locally-running Envoy instance and
toggles the Pulsar consumer as needed
61. Some “gotchas”
Eventually consistent
Consumers don’t pause immediately - updates
propagate with some latency
Start paused
Wasn’t a way to subscribe in a paused state - we made a
patch to the Java client
More advanced elevation patterns
Currently we can’t support percent elevations of pulsar
traffic onto new deploys
Receiver queue size
Critically important to tune this parameter of consumers
62. Results
~30
Toggle users in Prod
across pulsar consumers &
background workers
0
Outages
No added load on any
critical systems
2
Contributions
To open source - the Java
client & the Camel
integration
64. Users Love it!
65%
Increase
reported ease of use when
deploying pulsar consumer
changes
46%
Decrease
reported risk associated with
deploying pulsar consumer
changes
Positive feedback from satisfaction surveys with our users
65. Key Takeaways
Integration
Strong integration
with existing systems
is critical for org-wide
adoption.
Ease of Use
As we make our
Pulsar platform easier
to use, we see more
and more adoption.
Stability
Pulsar’s stability
through big growth
has been a killer
feature for us.
66. Kai Levy & Zach Walsh
Thank you!
klevy@toasttab.com
zachary.walsh@toasttab.com
Pulsar Summit
San Francisco
Hotel Nikko
August 18 2022
We’re Hiring!
careers.toasttab.com