Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Streaming Millions of Contact Center Interactions in (Near) Real-Time with Pulsar - Pulsar Summit NA 2021
1. 1
1
1
Streaming millions of
Contact Center
interactions in (near)
real-time with Pulsar
Frank Kelly
Principal Engineer, Cogito Corp
Slack: https://apache-pulsar.slack.com/
A panoply of parameters
2. 2
● Cogito & What we do
● Architecture & Use-Cases
● Challenges
● Initial lessons learned
● Kubernetes lessons learned
● Performance & Scaling settings
● Results
● Q&A
Intended Audience
Those who understand the main APIs and components but who may not be familiar with all the configuration
settings or how to optimize the system for high write throughput and/or millions of topics.
Overview
3. 3
Formed in 2007 out of MIT - based out of Boston - now with a Global Engineering Footprint
Vision: Elevating the human connection in real time . . . .
Product: Call center AI solution that analyzes the human voice and provides real-time guidance to
enhance emotional intelligence and customer service.
Cogito: Who we are and what we do
5. 5
● Streaming: Real-time audio and analytic
results from our AI/ML models
● We break each customer call into
separate logical units called “intervals”
● Each interval is backed by two Pulsar
topics
○ Real-time Audio Topic
○ Real-time Analytics Topic
● Splicing up binary formats into discrete
messages → Deduplication is VERY
important!
● With 15,000 concurrent users - we
estimate 1.5m to 2m topics per day
● Each topic has moderate throughput ~ 32
Kb/s
● Also Messaging: Work-Queue events
Use-Cases for Pulsar
6. 6
● Streaming Use-Case
○ Lots of throughput ~ 10 Gbps
○ Message-ordering & deduplication are critical
○ Near real-time requirements (< 250ms)
■ Think about timeouts/retries/failover
● Challenges
○ Zookeeper stores all the topics for a namespace
under one ZNode
○ Brokers require more memory
● Alternatives considered
○ Using key_shared would require us to disable
batching in the producer (not a huge deal)
○ Risk: Message dispatch will stop if there is a
subscription / consumer that has built up a backlog
of messages in their hash-range
○ Filtering on the client-side
The Challenges
7. 7
● Processing real-time binary streams
○ Consumer: SubscriptionInitialPosition.Earliest
○ Broker Configuration: brokerDeduplicationEnabled: "true"
● Client Performance
○ Producer: sendAsync() ~10x improvement
○ Producer: blockIfQueueFull(true)
○ Batching: Enabled but the throughput per Producer is so low it rarely becomes helpful
● Default Timeouts
○ For our real-time system the default connection / operation timeout of 30s is too high
● Persistent vs. Non-Persistent
○ We support both use-cases (some customers wish for zero persistence)
Initial Lessons on the basics
8. 8
● 15k Users ⇒ ingress of 5 Gbps Audio Data ⇒ 20 TB in a 12 hour window
● Open Subscriptions keep the topic data from being deleted
○ Code: pulsarAdmin.namespaces().setSubscriptionExpirationTime());
○ Broker Deduplication has its own subscription
■ brokerDeduplicationEntriesInterval: "50" (default: 1000)
■ brokerDeduplicationProducerInactivityTimeoutMinutes: "15" (default: 360)
● Bookie Compaction Thresholds (Delete more and do it more frequently)
○ majorCompactionInterval / majorCompactionThreshold
○ minorCompactionInterval / minorCompactionThreshold
○ compactionRate
● Tiered Storage
○ Although we use some Tiered storage there will be too many topics in ZK over time
○ Created our own Stream Offload that stores S3 location in RDS DB
Disk Space Challenges
9. 9
● Which Helm chart?
○ Apache Pulsar (“Official”) vs Streamnative (Also “Official”) vs Kafkaesque
● GC Settings
○ Java Ergonomics: -XX:+PrintFlagsFinal
○ GC Settings tied to Pod Memory: -Xms2g -Xmx2g -XX:MaxDirectMemorySize=6g
○ resources.requests.memory = Heap + Direct Memory + Some Buffer
○ Looking forward to seeing modern JVM settings e.g. -XX:MaxRAMPercentage=75%
● Most helm charts set requests but not limits. We set requests == limits
○ JVM Memory is not elastic
○ CPU is however we experienced a lot of throttling from K8S Scheduler
● Istio Service Mesh
○ Integration with Istio for mTLS and service-level authorization took a chunk of time
Kubernetes Lessons
11. 11
Active Monitoring with Prometheus Alerts
Integration with Prometheus Alerting to Slack / PagerDuty
12. 12
● Namespace Bundles
○ For 15 Brokers: defaultNumberOfNamespaceBundles: "128" (Default: 4)
● Pulsar Load Balancer
○ # Disable Bundle split due to https://github.com/apache/pulsar/issues/5510
○ loadBalancerAutoBundleSplitEnabled: "false"
● Balancing throughput, durability and reliability across Bookies
○ managedLedgerDefaultEnsembleSize: "N"
○ managedLedgerDefaultWriteQuorum: "2"
○ managedLedgerDefaultAckQuorum: "1"
○ Striping is great for write-throughput but adds cost for read throughput
Real-Time / Scaling Journey Lessons
13. 13
● Error
○ PerChannelBookieClient - Add for failed on bookie bookkeeper-2:3181 code EIO
Bookie EIO Error
Root Cause: At peak load Write Cache not big enough to hold
accumulated data while waiting on second cache flush
14. 14
● Key Prometheus Metrics
○ Bookie
■ bookie_throttled_write_requests
■ bookie_rejected_write_request
○ Broker
■ pulsar_ml_cache_hits_rate
■ pulsar_ml_cache_misses_rate
Bookie EIO Error
BAD
GOOD
Key Lesson
The more we read from the Broker cache, the less we use
the Bookie ledger disk (enabling faster flush of write cache
→ ledger)
15. 15
● EBS drives for Journal & Ledger
○ GP3 with max settings 16000 IOPS, 1000 MB/s
● Broker Cache
○ managedLedgerCacheEvictionTimeThresholdMillis: "5000" (Default: 1000)
○ managedLedgerCacheSizeMB: "512" (Default: 20% of total direct Memory)
● Bookie
○ dbStorage_writeCacheMaxSizeMb: "3072" (Default: 25% of total direct memory)
○ dbStorage_rocksDB_blockCacheSize: "1073741824" (Default: 10% of total direct memory)
○ journalMaxGroupWaitMSec: "10" (Default: 1ms)
● Scaling approach
○ Scale-out Bookies
○ Scale-up and Scale-out Brokers
Key Scaling Settings . . .
16. 16
We’re not at millions yet but we’re seeing a trend . . . .
1) Simulated 300 users for about 18 hours with artificially short 1 minute calls
2) 500k topics created (250k Audio / 250k Signal Analytics)
Latest Results
18. 18
Observations: ZooKeeper
ZK Disk Usage Increasing . . .
Suppressed: java.io.IOException: No space left on device
at
org.apache.zookeeper.server.SyncRequestProcessor$1.run(SyncRequestProcessor.java
:135) [org.apache.pulsar-pulsar-zookeeper-2.6.1.jar:2.6.1]
at
org.apache.zookeeper.server.ZooKeeperServer.takeSnapshot(ZooKeeperServer.java:31
2) [org.apache.pulsar-pulsar-zookeeper-2.6.1.jar:2.6.1]
at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.save(FileTxnSnapLog.java:
406) ~[org.apache.pulsar-pulsar-zookeeper-2.6.1.jar:2.6.1]
.
.
[Snapshot Thread] ERROR org.apache.zookeeper.server.ZooKeeperServer - Severe
unrecoverable error, exiting
21. 21
Implications
1) ZooKeeper
a) More Heap
b) More CPU for GC (and to avoid throttling during GC)
c) Watch ZooKeeper disk space /pulsar/data
2) Broker
a) More Heap
b) Maybe more CPU for GC (and to avoid throttling during GC)
c) Watch for Broker → ZK latency issues
i) zooKeeperSessionTimeoutMillis: "60000" (default: 30000)
ii) zooKeeperOperationTimeoutSeconds: "60" (default: 30)
23. 23
Thanks
Cogito
Bruce, Hamid, Andy, Jimmy, George, Gibby, Kyle, Matt, Amanda, John,
Ian, Mihai, Luis, Anthony, Karl and many more
Pulsar Community
Addison, Sijie, Matteo, Joshua etc.
25. 25
● Benchmarking Pulsar and Kafka - A More Accurate Perspective on Pulsar’s Performance
○ https://streamnative.io/en/blog/tech/2020-11-09-benchmark-pulsar-kafka-performance#maximum-t
hroughput-test
● Taking a Deep-Dive into Apache Pulsar Architecture for Performance Tuning
○ https://streamnative.io/en/blog/tech/2021-01-14-pulsar-architecture-performance-tuning
● Understanding How Apache Pulsar Works
○ https://jack-vanlightly.com/blog/2018/10/2/understanding-how-apache-pulsar-works
References