SlideShare una empresa de Scribd logo
1 de 31
Descargar para leer sin conexión
© 2021 SPLUNK INC.
Pushing
Pulsar
Performance
to the Limit
Pulsar Summit 2021
Sr
© 2021 SPLUNK INC.
Scaling
Apache Pulsar and Apache BookKeeper
scale out and up really well!
Up and Out!
© 2021 SPLUNK INC.
Pulsar Brokers
and BookKeeper Servers (bookies)
Scaling Out
© 2021 SPLUNK INC.
Topic partitions
Scaling Out
© 2021 SPLUNK INC.
More CPUs, more memory, more disks!
Scaling Up
© 2021 SPLUNK INC.
This all costs money!
● Pay for:
○ CPU
○ Memory
○ Disk IO
○ Disk size
○ Network (across AZs)
● Compression
○ Reduce network utilization
○ Reduce disk utilization and space
○ Potentially increase CPU utilization
● Reduce the replication factor
○ Pulsar can operate at a replication factor of 2
■ Other messaging systems have a minimum of 3 for high availability.
○ Trades off some safety for cost
● Can we do more?
© 2021 SPLUNK INC.
The BookKeeper Journal
● A journal is also known as a Write-Ahead Log (WAL)
● Common in data systems, especially databases
● What is a journal for?
○ Typically for atomicity and durability
○ Allows BookKeeper to provide strong durability while sidestepping some nasty performance
issues
© 2021 SPLUNK INC.
BookKeeper Entry Log Files
Strategy 1: No journal, write straight to entry log files. One Entry Log file per ledger
(corresponds to a topic partition)
Entries are stored in Entry Log files
writes
reads
writes
reads
Random IO
© 2021 SPLUNK INC.
BookKeeper Entry Log Files
Strategy 2: No journal, write multiple ledgers to the same active entry log file
Entries are stored in Entry Log files
writes
reads
write as they come in
writes
reads
Sequential IO
buffer and sort
?
Random IO
(Reads)
Write Latency
© 2021 SPLUNK INC.
The BookKeeper Journal
Optimizing for reads and writes
Strategy 3: Journal + Caches + Sorted Entry Log files
Entry Log file
writes
Sequential IO (writes)
Low write latency
buffer and sort
Journal
write as they come in
Caches
reads
Sequential IO
(reads)
Many reads
don’t hit disk
Double-write
Provisioning
© 2021 SPLUNK INC.
The BookKeeper Journal
● Double-write = double disk IO
○ Single disk = lower throughput
○ Multiple disks = more cost
● More complex provisioning:
○ Journal disk and entry data disk have different sizing requirements
Optional subtitle
© 2021 SPLUNK INC.
Can we turn the journal off?
● Apache Kafka writes to the page cache and doesn’t fsync every write
○ It can lose entries due to crash, power loss but the cluster remains ok as long as one copy exists
● Let’s turn off the journal...
What could possibly go wrong?
© 2021 SPLUNK INC.
With the journal off
● Pulsar isn’t like Kafka. Each topic is a segment based log.
What could possibly go wrong?
Ledger 1 Ledger 2 Ledger 3
Broker A Broker B
Recover
+ Close
Ledger 4
Create +
Append
© 2021 SPLUNK INC.
Ledger Recovery
● When a Pulsar broker recovers a
ledger it:
○ Find out what entries got committed (Ack
Quorum)
○ Ensure all committed entries are fully
replicated (Write Quorum)
○ Close the ledger with Last Entry Id = last
committed entry.
Read, repair and close
0, 1, 2, 3 0, 1, 2 0
Bookie 1 Bookie 2 Bookie 3
Last committed entry = 2
0, 1, 2, 3 0, 1, 2 0, 1, 2
Bookie 1 Bookie 2 Bookie 3
Ledger metadata:
ensembles:
- 0 -> b1, b2, b3
last entry id: 2
1
2
3
© 2021 SPLUNK INC.
Ledger Recovery
Determining if an entry is committed or not
WQ AQ B1 B2 B3 Entry status
3 2 OK OK pending Committed
3 2 NoSuchEntry NoSuchEntry pending Uncommitted
3 2 OK NoSuchEntry Error Don’t know
3 2 Error Error pending Don’t know
2 2 OK OK n/a Committed
2 2 NoSuchEntry pending n/a Uncommitted
2 2 Error pending Don’t know
© 2021 SPLUNK INC.
Ledger Recovery
● What happens if bookie 2 loses its
data?
○ The recovery protocol loses our data
Read, repair and close
0, 1, 2, 3 0, 1, 2 0
Bookie 1 Bookie 2 Bookie 3
Last committed entry = 0
0, 1, 2, 3 0 0
Bookie 1 Bookie 2 Bookie 3
Ledger metadata:
ensembles:
- 0 -> b1, b2, b3
last entry id: 0
0, 1, 2, 3 - 0
Bookie 1 Bookie 2 Bookie 3
Data loss!!!
1
2
3
4
© 2021 SPLUNK INC.
We need to change the BookKeeper
replication protocol
© 2021 SPLUNK INC.
Tweaking the Protocol
Detecting when data loss may have happened
Limbo: Turning a “NO” into an “I DON’T KNOW”
© 2021 SPLUNK INC.
Tweaking the Protocol
● Solution. A bookie that restarts after an abrupt
termination:
○ Detects the unclean shutdown
○ Places all non-closed ledgers in “limbo”
○ Repair: Scans index and compares against metadata
-> sources missing entries from peers
○ Once repaired, clears limbo status
○ While in limbo:
■ respond to all Last Add Confirmed reads with UNKNOWN
response code.
■ never respond with an explicit negative (NoSuchEntry /
NoSuchLedger), instead UNKNOWN
Detecting when data loss may have happened
Get missing
Get missing
Find out my
ledgers
© 2021 SPLUNK INC.
Ledger Recovery with Limbo Status
Read, repair and close
0, 1, 2, 3 0, 1, 2 0
Bookie 1 Bookie 2 Bookie 3
Last committed entry = UNKNOWN
0, 1, 2, 3 limbo 0
Bookie 1 Bookie 2 Bookie 3
OK Unknown NoSuchEntry
Bookie 1 Bookie 2 Bookie 3
Read Entry 1:
0, 1, 2, 3 0, 1, 2 0
Bookie 1 Bookie 2 Bookie 3
1
2
4 Data repair complete, limbo cleared
3
Last committed entry = 2
0, 1, 2, 3 0, 1, 2 0, 1, 2
Bookie 1 Bookie 2 Bookie 3
Ledger metadata:
ensembles:
- 0 -> b1, b2, b3
last entry id: 2
5
3
© 2021 SPLUNK INC.
Getting
Confidence
in the
Protocol
Change
• We modelled the BookKeeper replication
protocol in TLA+ earlier in the year
• Extended it to include:
– Arbitrary data loss
– Limbo status
© 2021 SPLUNK INC.
Without limbo - data loss invariant
violation
© 2021 SPLUNK INC.
Counter-
example
© 2021 SPLUNK INC.
How Pulsar can run
safely at replication
factor of 2...
Spoiler alert! It’s BookKeeper!
© 2021 SPLUNK INC.
Replication Factor, Toblerone & Maltesers
What has chocolate got to do with distributed
messaging systems?
The surprising truth!
© 2021 SPLUNK INC.
Replication Factor and Monolithic Logs
Available Unavailable
© 2021 SPLUNK INC.
Ensemble
change
Replication Factor and Decoupled,
Segmented Logs
Ensemble
change
Ensemble
change
Fragment 1 Fragment 2 Fragment 3 Fragment 4
© 2021 SPLUNK INC.
Replication Factor & Segment Based
Decoupled Storage Topics
Ledger 1 Ledger 2 Ledger 3
Bookie 1
Bookie 2
Bookie 3
Bookie 4
Bookie 1
Bookie 3 Bookie 2
Ledger 4
Bookie 2
Bookie 6
Ledger 1 Ledger 2
Bookie 1
Bookie 2
Bookie 3
Bookie 4
Ledger 3
Bookie 2
Bookie 6
© 2021 SPLUNK INC.
© 2021 SPLUNK INC.
© 2021 SPLUNK INC.
Thank you!
Questions...

Más contenido relacionado

La actualidad más candente

How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
StreamNative
 
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
StreamNative
 
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
StreamNative
 

La actualidad más candente (20)

How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
 
Interactive querying of streams using Apache Pulsar_Jerry peng
Interactive querying of streams using Apache Pulsar_Jerry pengInteractive querying of streams using Apache Pulsar_Jerry peng
Interactive querying of streams using Apache Pulsar_Jerry peng
 
Spark on Kubernetes
Spark on KubernetesSpark on Kubernetes
Spark on Kubernetes
 
Scaling customer engagement with apache pulsar
Scaling customer engagement with apache pulsarScaling customer engagement with apache pulsar
Scaling customer engagement with apache pulsar
 
Reduce Redundant Producers from Partitioned Producer - Pulsar Summit NA 2021
Reduce Redundant Producers from Partitioned Producer - Pulsar Summit NA 2021Reduce Redundant Producers from Partitioned Producer - Pulsar Summit NA 2021
Reduce Redundant Producers from Partitioned Producer - Pulsar Summit NA 2021
 
Query Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache FlinkQuery Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache Flink
 
How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...
 
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
 
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
 
Lessons from managing a Pulsar cluster (Nutanix)
Lessons from managing a Pulsar cluster (Nutanix)Lessons from managing a Pulsar cluster (Nutanix)
Lessons from managing a Pulsar cluster (Nutanix)
 
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka StreamsKafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
 
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
 
Architecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructureArchitecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructure
 
Five years of operating a large scale globally replicated Pulsar installation...
Five years of operating a large scale globally replicated Pulsar installation...Five years of operating a large scale globally replicated Pulsar installation...
Five years of operating a large scale globally replicated Pulsar installation...
 
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming ApplicationsRunning Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
How Splunk Is Using Pulsar IO
How Splunk Is Using Pulsar IOHow Splunk Is Using Pulsar IO
How Splunk Is Using Pulsar IO
 
Integrating Apache Pulsar with Big Data Ecosystem
Integrating Apache Pulsar with Big Data EcosystemIntegrating Apache Pulsar with Big Data Ecosystem
Integrating Apache Pulsar with Big Data Ecosystem
 
When apache pulsar meets apache flink
When apache pulsar meets apache flinkWhen apache pulsar meets apache flink
When apache pulsar meets apache flink
 
Kafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internalsKafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internals
 

Similar a Pushing Pulsar Performance to the Limits - Pulsar Summit NA 2021

Oracle 11g data warehouse introdution
Oracle 11g data warehouse introdutionOracle 11g data warehouse introdution
Oracle 11g data warehouse introdution
Aditya Trivedi
 
PostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical ReplicationPostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical Replication
Atsushi Torikoshi
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
Chester Chen
 
TokuDB internals / Лесин Владислав (Percona)
TokuDB internals / Лесин Владислав (Percona)TokuDB internals / Лесин Владислав (Percona)
TokuDB internals / Лесин Владислав (Percona)
Ontico
 

Similar a Pushing Pulsar Performance to the Limits - Pulsar Summit NA 2021 (20)

Bookie storage - Apache BookKeeper Meetup - 2015-06-28
Bookie storage - Apache BookKeeper Meetup - 2015-06-28 Bookie storage - Apache BookKeeper Meetup - 2015-06-28
Bookie storage - Apache BookKeeper Meetup - 2015-06-28
 
Skew Mitigation For Facebook PetabyteScale Joins
Skew Mitigation For Facebook PetabyteScale JoinsSkew Mitigation For Facebook PetabyteScale Joins
Skew Mitigation For Facebook PetabyteScale Joins
 
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens AxboeKernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
 
Kristina Robinson [InfluxData] | Understand and Visualize Your Data with Infl...
Kristina Robinson [InfluxData] | Understand and Visualize Your Data with Infl...Kristina Robinson [InfluxData] | Understand and Visualize Your Data with Infl...
Kristina Robinson [InfluxData] | Understand and Visualize Your Data with Infl...
 
Oracle 11g data warehouse introdution
Oracle 11g data warehouse introdutionOracle 11g data warehouse introdution
Oracle 11g data warehouse introdution
 
SVC / Storwize analysis cost effective storage planning (use case)
SVC / Storwize analysis cost effective storage planning (use case)SVC / Storwize analysis cost effective storage planning (use case)
SVC / Storwize analysis cost effective storage planning (use case)
 
Raft Engine Meetup 220702.pdf
Raft Engine Meetup 220702.pdfRaft Engine Meetup 220702.pdf
Raft Engine Meetup 220702.pdf
 
Please Upgrade Apache Kafka. Now. (Gwen Shapira, Confluent) Kafka Summit SF 2019
Please Upgrade Apache Kafka. Now. (Gwen Shapira, Confluent) Kafka Summit SF 2019Please Upgrade Apache Kafka. Now. (Gwen Shapira, Confluent) Kafka Summit SF 2019
Please Upgrade Apache Kafka. Now. (Gwen Shapira, Confluent) Kafka Summit SF 2019
 
Loadays MySQL
Loadays MySQLLoadays MySQL
Loadays MySQL
 
Apache Pulsar: The Next Generation Messaging and Queuing System
Apache Pulsar: The Next Generation Messaging and Queuing SystemApache Pulsar: The Next Generation Messaging and Queuing System
Apache Pulsar: The Next Generation Messaging and Queuing System
 
Nbackup and Backup: Internals, Usage strategy and Pitfalls, by Dmitry Kuzmenk...
Nbackup and Backup: Internals, Usage strategy and Pitfalls, by Dmitry Kuzmenk...Nbackup and Backup: Internals, Usage strategy and Pitfalls, by Dmitry Kuzmenk...
Nbackup and Backup: Internals, Usage strategy and Pitfalls, by Dmitry Kuzmenk...
 
Building a Distributed Message Log from Scratch
Building a Distributed Message Log from ScratchBuilding a Distributed Message Log from Scratch
Building a Distributed Message Log from Scratch
 
PostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical ReplicationPostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical Replication
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
 
TokuDB internals / Лесин Владислав (Percona)
TokuDB internals / Лесин Владислав (Percona)TokuDB internals / Лесин Владислав (Percona)
TokuDB internals / Лесин Владислав (Percona)
 
An Elastic Metadata Store for eBay’s Media Platform
An Elastic Metadata Store for eBay’s Media PlatformAn Elastic Metadata Store for eBay’s Media Platform
An Elastic Metadata Store for eBay’s Media Platform
 
Percona Live 2022 - The Evolution of a MySQL Database System
Percona Live 2022 - The Evolution of a MySQL Database SystemPercona Live 2022 - The Evolution of a MySQL Database System
Percona Live 2022 - The Evolution of a MySQL Database System
 
Technical Modifications to Compress Period End Close - R12.1.3
Technical Modifications to Compress Period End Close - R12.1.3Technical Modifications to Compress Period End Close - R12.1.3
Technical Modifications to Compress Period End Close - R12.1.3
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems
 

Más de StreamNative

Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
StreamNative
 

Más de StreamNative (20)

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Pushing Pulsar Performance to the Limits - Pulsar Summit NA 2021

  • 1. © 2021 SPLUNK INC. Pushing Pulsar Performance to the Limit Pulsar Summit 2021 Sr
  • 2. © 2021 SPLUNK INC. Scaling Apache Pulsar and Apache BookKeeper scale out and up really well! Up and Out!
  • 3. © 2021 SPLUNK INC. Pulsar Brokers and BookKeeper Servers (bookies) Scaling Out
  • 4. © 2021 SPLUNK INC. Topic partitions Scaling Out
  • 5. © 2021 SPLUNK INC. More CPUs, more memory, more disks! Scaling Up
  • 6. © 2021 SPLUNK INC. This all costs money! ● Pay for: ○ CPU ○ Memory ○ Disk IO ○ Disk size ○ Network (across AZs) ● Compression ○ Reduce network utilization ○ Reduce disk utilization and space ○ Potentially increase CPU utilization ● Reduce the replication factor ○ Pulsar can operate at a replication factor of 2 ■ Other messaging systems have a minimum of 3 for high availability. ○ Trades off some safety for cost ● Can we do more?
  • 7. © 2021 SPLUNK INC. The BookKeeper Journal ● A journal is also known as a Write-Ahead Log (WAL) ● Common in data systems, especially databases ● What is a journal for? ○ Typically for atomicity and durability ○ Allows BookKeeper to provide strong durability while sidestepping some nasty performance issues
  • 8. © 2021 SPLUNK INC. BookKeeper Entry Log Files Strategy 1: No journal, write straight to entry log files. One Entry Log file per ledger (corresponds to a topic partition) Entries are stored in Entry Log files writes reads writes reads Random IO
  • 9. © 2021 SPLUNK INC. BookKeeper Entry Log Files Strategy 2: No journal, write multiple ledgers to the same active entry log file Entries are stored in Entry Log files writes reads write as they come in writes reads Sequential IO buffer and sort ? Random IO (Reads) Write Latency
  • 10. © 2021 SPLUNK INC. The BookKeeper Journal Optimizing for reads and writes Strategy 3: Journal + Caches + Sorted Entry Log files Entry Log file writes Sequential IO (writes) Low write latency buffer and sort Journal write as they come in Caches reads Sequential IO (reads) Many reads don’t hit disk Double-write Provisioning
  • 11. © 2021 SPLUNK INC. The BookKeeper Journal ● Double-write = double disk IO ○ Single disk = lower throughput ○ Multiple disks = more cost ● More complex provisioning: ○ Journal disk and entry data disk have different sizing requirements Optional subtitle
  • 12. © 2021 SPLUNK INC. Can we turn the journal off? ● Apache Kafka writes to the page cache and doesn’t fsync every write ○ It can lose entries due to crash, power loss but the cluster remains ok as long as one copy exists ● Let’s turn off the journal... What could possibly go wrong?
  • 13. © 2021 SPLUNK INC. With the journal off ● Pulsar isn’t like Kafka. Each topic is a segment based log. What could possibly go wrong? Ledger 1 Ledger 2 Ledger 3 Broker A Broker B Recover + Close Ledger 4 Create + Append
  • 14. © 2021 SPLUNK INC. Ledger Recovery ● When a Pulsar broker recovers a ledger it: ○ Find out what entries got committed (Ack Quorum) ○ Ensure all committed entries are fully replicated (Write Quorum) ○ Close the ledger with Last Entry Id = last committed entry. Read, repair and close 0, 1, 2, 3 0, 1, 2 0 Bookie 1 Bookie 2 Bookie 3 Last committed entry = 2 0, 1, 2, 3 0, 1, 2 0, 1, 2 Bookie 1 Bookie 2 Bookie 3 Ledger metadata: ensembles: - 0 -> b1, b2, b3 last entry id: 2 1 2 3
  • 15. © 2021 SPLUNK INC. Ledger Recovery Determining if an entry is committed or not WQ AQ B1 B2 B3 Entry status 3 2 OK OK pending Committed 3 2 NoSuchEntry NoSuchEntry pending Uncommitted 3 2 OK NoSuchEntry Error Don’t know 3 2 Error Error pending Don’t know 2 2 OK OK n/a Committed 2 2 NoSuchEntry pending n/a Uncommitted 2 2 Error pending Don’t know
  • 16. © 2021 SPLUNK INC. Ledger Recovery ● What happens if bookie 2 loses its data? ○ The recovery protocol loses our data Read, repair and close 0, 1, 2, 3 0, 1, 2 0 Bookie 1 Bookie 2 Bookie 3 Last committed entry = 0 0, 1, 2, 3 0 0 Bookie 1 Bookie 2 Bookie 3 Ledger metadata: ensembles: - 0 -> b1, b2, b3 last entry id: 0 0, 1, 2, 3 - 0 Bookie 1 Bookie 2 Bookie 3 Data loss!!! 1 2 3 4
  • 17. © 2021 SPLUNK INC. We need to change the BookKeeper replication protocol
  • 18. © 2021 SPLUNK INC. Tweaking the Protocol Detecting when data loss may have happened Limbo: Turning a “NO” into an “I DON’T KNOW”
  • 19. © 2021 SPLUNK INC. Tweaking the Protocol ● Solution. A bookie that restarts after an abrupt termination: ○ Detects the unclean shutdown ○ Places all non-closed ledgers in “limbo” ○ Repair: Scans index and compares against metadata -> sources missing entries from peers ○ Once repaired, clears limbo status ○ While in limbo: ■ respond to all Last Add Confirmed reads with UNKNOWN response code. ■ never respond with an explicit negative (NoSuchEntry / NoSuchLedger), instead UNKNOWN Detecting when data loss may have happened Get missing Get missing Find out my ledgers
  • 20. © 2021 SPLUNK INC. Ledger Recovery with Limbo Status Read, repair and close 0, 1, 2, 3 0, 1, 2 0 Bookie 1 Bookie 2 Bookie 3 Last committed entry = UNKNOWN 0, 1, 2, 3 limbo 0 Bookie 1 Bookie 2 Bookie 3 OK Unknown NoSuchEntry Bookie 1 Bookie 2 Bookie 3 Read Entry 1: 0, 1, 2, 3 0, 1, 2 0 Bookie 1 Bookie 2 Bookie 3 1 2 4 Data repair complete, limbo cleared 3 Last committed entry = 2 0, 1, 2, 3 0, 1, 2 0, 1, 2 Bookie 1 Bookie 2 Bookie 3 Ledger metadata: ensembles: - 0 -> b1, b2, b3 last entry id: 2 5 3
  • 21. © 2021 SPLUNK INC. Getting Confidence in the Protocol Change • We modelled the BookKeeper replication protocol in TLA+ earlier in the year • Extended it to include: – Arbitrary data loss – Limbo status
  • 22. © 2021 SPLUNK INC. Without limbo - data loss invariant violation
  • 23. © 2021 SPLUNK INC. Counter- example
  • 24. © 2021 SPLUNK INC. How Pulsar can run safely at replication factor of 2... Spoiler alert! It’s BookKeeper!
  • 25. © 2021 SPLUNK INC. Replication Factor, Toblerone & Maltesers What has chocolate got to do with distributed messaging systems? The surprising truth!
  • 26. © 2021 SPLUNK INC. Replication Factor and Monolithic Logs Available Unavailable
  • 27. © 2021 SPLUNK INC. Ensemble change Replication Factor and Decoupled, Segmented Logs Ensemble change Ensemble change Fragment 1 Fragment 2 Fragment 3 Fragment 4
  • 28. © 2021 SPLUNK INC. Replication Factor & Segment Based Decoupled Storage Topics Ledger 1 Ledger 2 Ledger 3 Bookie 1 Bookie 2 Bookie 3 Bookie 4 Bookie 1 Bookie 3 Bookie 2 Ledger 4 Bookie 2 Bookie 6 Ledger 1 Ledger 2 Bookie 1 Bookie 2 Bookie 3 Bookie 4 Ledger 3 Bookie 2 Bookie 6
  • 31. © 2021 SPLUNK INC. Thank you! Questions...