Pushing Pulsar Performance to the Limits - Pulsar Summit NA 2021

© 2021 SPLUNK INC.
Pushing
Pulsar
Performance
to the Limit
Pulsar Summit 2021
Sr

© 2021 SPLUNK INC.
Scaling
Apache Pulsar and Apache BookKeeper
scale out and up really well!
Up and Out!

© 2021 SPLUNK INC.
Pulsar Brokers
and BookKeeper Servers (bookies)
Scaling Out

© 2021 SPLUNK INC.
Topic partitions
Scaling Out

© 2021 SPLUNK INC.
More CPUs, more memory, more disks!
Scaling Up

© 2021 SPLUNK INC.
This all costs money!
● Pay for:
○ CPU
○ Memory
○ Disk IO
○ Disk size
○ Network (across AZs)
● Compression
○ Reduce network utilization
○ Reduce disk utilization and space
○ Potentially increase CPU utilization
● Reduce the replication factor
○ Pulsar can operate at a replication factor of 2
■ Other messaging systems have a minimum of 3 for high availability.
○ Trades off some safety for cost
● Can we do more?

© 2021 SPLUNK INC.
The BookKeeper Journal
● A journal is also known as a Write-Ahead Log (WAL)
● Common in data systems, especially databases
● What is a journal for?
○ Typically for atomicity and durability
○ Allows BookKeeper to provide strong durability while sidestepping some nasty performance
issues

© 2021 SPLUNK INC.
BookKeeper Entry Log Files
Strategy 1: No journal, write straight to entry log files. One Entry Log file per ledger
(corresponds to a topic partition)
Entries are stored in Entry Log files
writes
reads
writes
reads
Random IO

© 2021 SPLUNK INC.
BookKeeper Entry Log Files
Strategy 2: No journal, write multiple ledgers to the same active entry log file
Entries are stored in Entry Log files
writes
reads
write as they come in
writes
reads
Sequential IO
buffer and sort
?
Random IO
(Reads)
Write Latency

© 2021 SPLUNK INC.
Optimizing for reads and writes
Strategy 3: Journal + Caches + Sorted Entry Log files
Entry Log file
writes
Sequential IO (writes)
Low write latency
buffer and sort
Journal
write as they come in
Caches
reads
Sequential IO
(reads)
Many reads
don’t hit disk
Double-write
Provisioning

© 2021 SPLUNK INC.
● Double-write = double disk IO
○ Single disk = lower throughput
○ Multiple disks = more cost
● More complex provisioning:
○ Journal disk and entry data disk have different sizing requirements
Optional subtitle

© 2021 SPLUNK INC.
Can we turn the journal off?
● Apache Kafka writes to the page cache and doesn’t fsync every write
○ It can lose entries due to crash, power loss but the cluster remains ok as long as one copy exists
● Let’s turn off the journal...
What could possibly go wrong?

© 2021 SPLUNK INC.
With the journal off
● Pulsar isn’t like Kafka. Each topic is a segment based log.
What could possibly go wrong?
Ledger 1 Ledger 2 Ledger 3
Broker A Broker B
Recover
+ Close
Ledger 4
Create +
Append

© 2021 SPLUNK INC.
Ledger Recovery
● When a Pulsar broker recovers a
ledger it:
○ Find out what entries got committed (Ack
Quorum)
○ Ensure all committed entries are fully
replicated (Write Quorum)
○ Close the ledger with Last Entry Id = last
committed entry.
Read, repair and close
0, 1, 2, 3 0, 1, 2 0
Bookie 1 Bookie 2 Bookie 3
Last committed entry = 2
0, 1, 2, 3 0, 1, 2 0, 1, 2
Ledger metadata:
ensembles:
- 0 -> b1, b2, b3
last entry id: 2
1
2
3

© 2021 SPLUNK INC.
Ledger Recovery
Determining if an entry is committed or not
WQ AQ B1 B2 B3 Entry status
3 2 OK OK pending Committed
3 2 NoSuchEntry NoSuchEntry pending Uncommitted
3 2 OK NoSuchEntry Error Don’t know
3 2 Error Error pending Don’t know
2 2 OK OK n/a Committed
2 2 NoSuchEntry pending n/a Uncommitted
2 2 Error pending Don’t know

© 2021 SPLUNK INC.
Ledger Recovery
● What happens if bookie 2 loses its
data?
○ The recovery protocol loses our data
0, 1, 2, 3 0, 1, 2 0
0, 1, 2, 3 0 0
Ledger metadata:
ensembles:
- 0 -> b1, b2, b3
last entry id: 0
0, 1, 2, 3 - 0
Data loss!!!
1
2
3
4

© 2021 SPLUNK INC.
We need to change the BookKeeper
replication protocol

© 2021 SPLUNK INC.
Tweaking the Protocol
Detecting when data loss may have happened
Limbo: Turning a “NO” into an “I DON’T KNOW”

© 2021 SPLUNK INC.
Tweaking the Protocol
● Solution. A bookie that restarts after an abrupt
termination:
○ Detects the unclean shutdown
○ Places all non-closed ledgers in “limbo”
○ Repair: Scans index and compares against metadata
-> sources missing entries from peers
○ Once repaired, clears limbo status
○ While in limbo:
■ respond to all Last Add Confirmed reads with UNKNOWN
response code.
■ never respond with an explicit negative (NoSuchEntry /
NoSuchLedger), instead UNKNOWN
Detecting when data loss may have happened
Get missing
Get missing
Find out my
ledgers

© 2021 SPLUNK INC.
Ledger Recovery with Limbo Status
0, 1, 2, 3 0, 1, 2 0
Last committed entry = UNKNOWN
0, 1, 2, 3 limbo 0
OK Unknown NoSuchEntry
Read Entry 1:
0, 1, 2, 3 0, 1, 2 0
1
2
4 Data repair complete, limbo cleared
3
0, 1, 2, 3 0, 1, 2 0, 1, 2
Ledger metadata:
ensembles:
- 0 -> b1, b2, b3
last entry id: 2
5
3

© 2021 SPLUNK INC.
Getting
Confidence
in the
Protocol
Change
• We modelled the BookKeeper replication
protocol in TLA+ earlier in the year
• Extended it to include:
– Arbitrary data loss
– Limbo status

© 2021 SPLUNK INC.
Replication Factor, Toblerone & Maltesers
What has chocolate got to do with distributed
messaging systems?
The surprising truth!

© 2021 SPLUNK INC.
Ensemble
change
Replication Factor and Decoupled,
Segmented Logs
Ensemble
change
Ensemble
change
Fragment 1 Fragment 2 Fragment 3 Fragment 4

© 2021 SPLUNK INC.
Replication Factor & Segment Based
Decoupled Storage Topics
Ledger 1 Ledger 2 Ledger 3
Bookie 1
Bookie 2
Bookie 3
Bookie 4
Bookie 1
Bookie 3 Bookie 2
Ledger 4
Bookie 2
Bookie 6
Ledger 1 Ledger 2
Bookie 1
Bookie 2
Bookie 3
Bookie 4
Ledger 3
Bookie 2
Bookie 6

Pushing Pulsar Performance to the Limits - Pulsar Summit NA 2021

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Pushing Pulsar Performance to the Limits - Pulsar Summit NA 2021

Similar a Pushing Pulsar Performance to the Limits - Pulsar Summit NA 2021 (20)

Más de StreamNative

Más de StreamNative (20)

Último

Último (20)

Pushing Pulsar Performance to the Limits - Pulsar Summit NA 2021