SlideShare una empresa de Scribd logo
streamnative.io
TGIP Episode 016
Backlog vs StorageSize
Question #1
Q1: Without retention settings, why storage size is much larger than
msg backlog size?
Question #2
Q2: Why does storage size keep growing while the msg backlog size is
small?
Question #3
Q3: What is the expected behavior when combining 100GB retention
policy, 1GB backlog quota and a consumer with
SubscriptionInitialPosition.Earliest setting?
https://github.com/apache/pulsar/issues/7500
Topic & Subscription
Topic Partition
Storage model
#1 msgs are stored ONCE in a distributed log for each topic partition
Topic Partition Impl
Subscription & Cursor
Storage model
#1 Msgs are stored ONCE in a distributed log for each topic partition
#2 Cursor stores the consumption state of a subscription
Acknowledgment
Storage model
#1 Msgs are stored ONCE in a distributed log for each topic partition
#2 Cursor stores the consumption state of a subscription
#3 Cursor == Offset + Individual Deletes. Acks update cursor.
Multi Subscriptions
Storage model
#1 Msgs are stored ONCE in a distributed log for each topic partition
#2 Cursor stores the consumption state of a subscription
#3 Cursor == Offset + Individual Deletes. Acks update cursor.
#4 Msgs are READY to delete after subscriptions ack them.
Backlog
✓ Subscription Backlog:
All the unacked
messages of a
subscription
✓ Topic Backlog:
The backlog of the
slowest subscription
Backlog Stats
✓ backlogSize: The total bytes of unacked messages
✓ msgBacklog: The total number of unacked entries
✓ Use `bin/pulsar-admin topics stats` to query the statistics of a
topic
https://github.com/apache/pulsar/issues/7484
Storage model
#1 Msgs are stored ONCE in a distributed log for each topic partition
#2 Cursor stores the consumption state of a subscription
#3 Cursor == Offset + Individual Deletes. Acks update cursor.
#4 Msgs are READY to delete after subscriptions ack them.
#5 The unacked msgs are “kept” in a subscription backlog.
Backlog Quota
Backlog Retention Policy
- Producer_request_hold
- Producer_exception
- consumer_backlog_eviction
Backlog Quota Config
✓ Cluster config
- backlogQuotaDefaultLimitGB
- backlogQuotaDefaultRetentionPolicy
- backlogQuotaCheckEnabled
✓ Namespace policy
Usage: set-backlog-quota [options] tenant/namespace
Options:
* -l, --limit
Size limit (eg: 10M, 16G)
* -p, --policy
Retention policy to enforce when the limit is reached. Valid options are:
[producer_request_hold, producer_exception, consumer_backlog_eviction]
Storage model
#1 Msgs are stored ONCE in a distributed log for each topic partition
#2 Cursor stores the consumption state of a subscription
#3 Cursor == Offset + Individual Deletes. Acks update cursor.
#4 Msgs are READY to delete after subscriptions ack them.
#5 The unacked msgs are “kept” in a subscription backlog.
#6 Backlog quota sets a CAP on unacked messages.
TTL
✓ TTL defines the amount of time a message is allowed to stay in the
unacknowledged state
TTL Config
✓ Cluster config
- ttlDurationDefaultInSeconds
✓ Namespace policy
Set Message TTL for a namespace
Usage: set-message-ttl [options] tenant/namespace
Options:
* --messageTTL, -ttl
Message TTL in seconds
Default: 0
Storage model
#1 Msgs are stored ONCE in a distributed log for each topic partition
#2 Cursor stores the consumption state of a subscription
#3 Cursor == Offset + Individual Deletes. Acks update cursor.
#4 Msgs are READY to delete after subscriptions ack them.
#5 The unacked msgs are “kept” in a subscription backlog.
#6 Backlog quota sets a CAP on unacked messages.
#7 TTL defines the time a msg can stay in the unacknowledged state
Retention
✓ Retention Policy
defines the
amount of time
acked messages
would be kept
before deletion
Retention Config
✓ Cluster config
- defaultRetentionTimeInMinutes
- defaultRetentionSizeInMB
✓ Namespace policy
Set the retention policy for a namespace
Usage: set-retention [options] tenant/namespace
Options:
* --size, -s
Retention size limit (eg: 10M, 16G, 3T). 0 or less than 1MB means no
retention and -1 means infinite size retention
* --time, -t
Retention time in minutes (or minutes, hours,days,weeks eg: 100m, 3h, 2d,
5w). 0 means no retention and -1 means infinite time retention
Storage model
#1 Msgs are stored ONCE in a distributed log for each topic partition
#2 Cursor stores the consumption state of a subscription
#3 Cursor == Offset + Individual Deletes. Acks update cursor.
#4 Msgs are READY to delete after subscriptions ack them.
#5 The unacked msgs are “kept” in a subscription backlog.
#6 Backlog quota sets a CAP on unacked messages.
#7 TTL defines the time a msg can stay in the unacknowledged state
#8 Retention Policy defines how to handle acked messages
Put everything together
Storage Size
✓ Storage Size is the total amount of data that are not DELETED.
✓ Storage Size = Backlog Size + Retained Messages Size
Storage Size
✓ Messages are not deleted one by
one
✓ Messages are deleted segment
by segment
✓ Messages pass retention period
are kept because other
messages in the same segment
are in retention period
Storage model
#1 Msgs are stored ONCE in a distributed log for each topic partition
#2 Cursor stores the consumption state of a subscription
#3 Cursor == Offset + Individual Deletes. Acks update cursor.
#4 Msgs are READY to delete after subscriptions ack them.
#5 The unacked msgs are “kept” in a subscription backlog.
#6 Backlog quota sets a CAP on unacked messages.
#7 TTL defines the time a msg can stay in the unacknowledged state
#8 Retention Policy defines how to handle acked messages
#9 Messages are deleted segment by segment (not individually)
Msg Backlog vs Storage Size
#a backlog == storage size
#b storage size >> backlog
Troubleshooting
✓ `pulsar-admin topics stats-internal`
✓ The hole in `individuallyDeletedMessages` will cause storage size
growing while backlog remains small
✓ `pulsar-admin topics unload` to trigger redelivery
✓ Set proper ackTimeout
Storage Size vs Disk Usage
Storage model
#1 Msgs are stored ONCE in a distributed log for each topic partition
#2 Cursor stores the consumption state of a subscription
#3 Cursor == Offset + Individual Deletes. Acks update cursor.
#4 Msgs are READY to delete after subscriptions ack them.
#5 The unacked msgs are “kept” in a subscription backlog.
#6 Backlog quota sets a CAP on unacked messages.
#7 TTL defines the time a msg can stay in the unacknowledged state
#8 Retention Policy defines how to handle acked messages
#9 Messages are deleted segment by segment (not individually)
#10 The disk space of DELETED msgs is reclaimed lazily
Go to production
✓ Set backlog quota and policy
✓ Set TTL policy
✓ Set retention policy
✓ Tune settings related to storage size
- managedLedgerMinLedgerRolloverTimeMinutes
- managedLedgerMaxLedgerRolloverTimeMinutes
- managedLedgerMaxSizePerLedgerMbytes
✓ Tune bookie GC settings
- gcWaitTime
- minorCompactionInterval
- majorCompactionThreshold

Más contenido relacionado

Similar a TGI Pulsar - episode 016: backlog vs storage size

Lessons from managing a Pulsar cluster (Nutanix)
Lessons from managing a Pulsar cluster (Nutanix)Lessons from managing a Pulsar cluster (Nutanix)
Lessons from managing a Pulsar cluster (Nutanix)
StreamNative
 
Scaling your analytics with Amazon EMR
Scaling your analytics with Amazon EMRScaling your analytics with Amazon EMR
Scaling your analytics with Amazon EMR
Israel AWS User Group
 
Going Deep with MQ
Going Deep with MQGoing Deep with MQ
Going Deep with MQ
Matt Leming
 
lessons from managing a pulsar cluster
 lessons from managing a pulsar cluster lessons from managing a pulsar cluster
lessons from managing a pulsar cluster
Shivji Kumar Jha
 
Google file system
Google file systemGoogle file system
Google file system
Roopesh Jhurani
 
Let the Tiger Roar!
Let the Tiger Roar!Let the Tiger Roar!
Let the Tiger Roar!
MongoDB
 
(STG402) Amazon EBS Deep Dive
(STG402) Amazon EBS Deep Dive(STG402) Amazon EBS Deep Dive
(STG402) Amazon EBS Deep Dive
Amazon Web Services
 
Storage talk
Storage talkStorage talk
Storage talk
christkv
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Saroj Panyasrivanit
 
From the trenches: scaling a large log management deployment
From the trenches: scaling a large log management deploymentFrom the trenches: scaling a large log management deployment
From the trenches: scaling a large log management deployment
FaithWestdorp
 
Optimize MySQL Workloads with Amazon Elastic Block Store - February 2017 AWS ...
Optimize MySQL Workloads with Amazon Elastic Block Store - February 2017 AWS ...Optimize MySQL Workloads with Amazon Elastic Block Store - February 2017 AWS ...
Optimize MySQL Workloads with Amazon Elastic Block Store - February 2017 AWS ...
Amazon Web Services
 
Troubleshooting as Your Kafka Clusters Grow (Krunal Vora, Tinder) Kafka Summi...
Troubleshooting as Your Kafka Clusters Grow (Krunal Vora, Tinder) Kafka Summi...Troubleshooting as Your Kafka Clusters Grow (Krunal Vora, Tinder) Kafka Summi...
Troubleshooting as Your Kafka Clusters Grow (Krunal Vora, Tinder) Kafka Summi...
confluent
 
Cassandra South Bay Meetup - Backup And Restore For Apache Cassandra
Cassandra South Bay Meetup - Backup And Restore For Apache CassandraCassandra South Bay Meetup - Backup And Restore For Apache Cassandra
Cassandra South Bay Meetup - Backup And Restore For Apache Cassandra
aaronmorton
 
XMPP Academy #1
XMPP Academy #1XMPP Academy #1
XMPP Academy #1
Mickaël Rémond
 
Inter connect2016 yss1841-cloud-storage-options-v4
Inter connect2016 yss1841-cloud-storage-options-v4Inter connect2016 yss1841-cloud-storage-options-v4
Inter connect2016 yss1841-cloud-storage-options-v4
Tony Pearson
 
PROACT SYNC 2013 - Breakout - CommVault OnePass Eén centrale interface voor s...
PROACT SYNC 2013 - Breakout - CommVault OnePass Eén centrale interface voor s...PROACT SYNC 2013 - Breakout - CommVault OnePass Eén centrale interface voor s...
PROACT SYNC 2013 - Breakout - CommVault OnePass Eén centrale interface voor s...
Proact Netherlands B.V.
 
Get the most out of your Windows Azure VMs
Get the most out of your Windows Azure VMsGet the most out of your Windows Azure VMs
Get the most out of your Windows Azure VMs
Ivan Donev
 
Wtu 2014 ibm web sphere mq for zos - shared queues
Wtu 2014   ibm web sphere mq for zos - shared queuesWtu 2014   ibm web sphere mq for zos - shared queues
Wtu 2014 ibm web sphere mq for zos - shared queues
Alexander Ross
 
Backing Up Data with MMS
Backing Up Data with MMSBacking Up Data with MMS
Backing Up Data with MMS
MongoDB
 
AWS Summit London 2014 | Amazon Elastic MapReduce Deep Dive and Best Practice...
AWS Summit London 2014 | Amazon Elastic MapReduce Deep Dive and Best Practice...AWS Summit London 2014 | Amazon Elastic MapReduce Deep Dive and Best Practice...
AWS Summit London 2014 | Amazon Elastic MapReduce Deep Dive and Best Practice...
Amazon Web Services
 

Similar a TGI Pulsar - episode 016: backlog vs storage size (20)

Lessons from managing a Pulsar cluster (Nutanix)
Lessons from managing a Pulsar cluster (Nutanix)Lessons from managing a Pulsar cluster (Nutanix)
Lessons from managing a Pulsar cluster (Nutanix)
 
Scaling your analytics with Amazon EMR
Scaling your analytics with Amazon EMRScaling your analytics with Amazon EMR
Scaling your analytics with Amazon EMR
 
Going Deep with MQ
Going Deep with MQGoing Deep with MQ
Going Deep with MQ
 
lessons from managing a pulsar cluster
 lessons from managing a pulsar cluster lessons from managing a pulsar cluster
lessons from managing a pulsar cluster
 
Google file system
Google file systemGoogle file system
Google file system
 
Let the Tiger Roar!
Let the Tiger Roar!Let the Tiger Roar!
Let the Tiger Roar!
 
(STG402) Amazon EBS Deep Dive
(STG402) Amazon EBS Deep Dive(STG402) Amazon EBS Deep Dive
(STG402) Amazon EBS Deep Dive
 
Storage talk
Storage talkStorage talk
Storage talk
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
From the trenches: scaling a large log management deployment
From the trenches: scaling a large log management deploymentFrom the trenches: scaling a large log management deployment
From the trenches: scaling a large log management deployment
 
Optimize MySQL Workloads with Amazon Elastic Block Store - February 2017 AWS ...
Optimize MySQL Workloads with Amazon Elastic Block Store - February 2017 AWS ...Optimize MySQL Workloads with Amazon Elastic Block Store - February 2017 AWS ...
Optimize MySQL Workloads with Amazon Elastic Block Store - February 2017 AWS ...
 
Troubleshooting as Your Kafka Clusters Grow (Krunal Vora, Tinder) Kafka Summi...
Troubleshooting as Your Kafka Clusters Grow (Krunal Vora, Tinder) Kafka Summi...Troubleshooting as Your Kafka Clusters Grow (Krunal Vora, Tinder) Kafka Summi...
Troubleshooting as Your Kafka Clusters Grow (Krunal Vora, Tinder) Kafka Summi...
 
Cassandra South Bay Meetup - Backup And Restore For Apache Cassandra
Cassandra South Bay Meetup - Backup And Restore For Apache CassandraCassandra South Bay Meetup - Backup And Restore For Apache Cassandra
Cassandra South Bay Meetup - Backup And Restore For Apache Cassandra
 
XMPP Academy #1
XMPP Academy #1XMPP Academy #1
XMPP Academy #1
 
Inter connect2016 yss1841-cloud-storage-options-v4
Inter connect2016 yss1841-cloud-storage-options-v4Inter connect2016 yss1841-cloud-storage-options-v4
Inter connect2016 yss1841-cloud-storage-options-v4
 
PROACT SYNC 2013 - Breakout - CommVault OnePass Eén centrale interface voor s...
PROACT SYNC 2013 - Breakout - CommVault OnePass Eén centrale interface voor s...PROACT SYNC 2013 - Breakout - CommVault OnePass Eén centrale interface voor s...
PROACT SYNC 2013 - Breakout - CommVault OnePass Eén centrale interface voor s...
 
Get the most out of your Windows Azure VMs
Get the most out of your Windows Azure VMsGet the most out of your Windows Azure VMs
Get the most out of your Windows Azure VMs
 
Wtu 2014 ibm web sphere mq for zos - shared queues
Wtu 2014   ibm web sphere mq for zos - shared queuesWtu 2014   ibm web sphere mq for zos - shared queues
Wtu 2014 ibm web sphere mq for zos - shared queues
 
Backing Up Data with MMS
Backing Up Data with MMSBacking Up Data with MMS
Backing Up Data with MMS
 
AWS Summit London 2014 | Amazon Elastic MapReduce Deep Dive and Best Practice...
AWS Summit London 2014 | Amazon Elastic MapReduce Deep Dive and Best Practice...AWS Summit London 2014 | Amazon Elastic MapReduce Deep Dive and Best Practice...
AWS Summit London 2014 | Amazon Elastic MapReduce Deep Dive and Best Practice...
 

Más de StreamNative

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
StreamNative
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
StreamNative
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
StreamNative
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...
StreamNative
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
StreamNative
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
StreamNative
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
StreamNative
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
StreamNative
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
StreamNative
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
StreamNative
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022
StreamNative
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
StreamNative
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
StreamNative
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
StreamNative
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
StreamNative
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
StreamNative
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
StreamNative
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
StreamNative
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
StreamNative
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
StreamNative
 

Más de StreamNative (20)

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
 

Último

怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
rtunex8r
 
Discover the benefits of outsourcing SEO to India
Discover the benefits of outsourcing SEO to IndiaDiscover the benefits of outsourcing SEO to India
Discover the benefits of outsourcing SEO to India
davidjhones387
 
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
3a0sd7z3
 
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
3a0sd7z3
 
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
uehowe
 
一比一原版(USYD毕业证)悉尼大学毕业证如何办理
一比一原版(USYD毕业证)悉尼大学毕业证如何办理一比一原版(USYD毕业证)悉尼大学毕业证如何办理
一比一原版(USYD毕业证)悉尼大学毕业证如何办理
k4ncd0z
 
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
ysasp1
 
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
xjq03c34
 
manuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal
manuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaalmanuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal
manuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal
wolfsoftcompanyco
 
办理毕业证(NYU毕业证)纽约大学毕业证成绩单官方原版办理
办理毕业证(NYU毕业证)纽约大学毕业证成绩单官方原版办理办理毕业证(NYU毕业证)纽约大学毕业证成绩单官方原版办理
办理毕业证(NYU毕业证)纽约大学毕业证成绩单官方原版办理
uehowe
 
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
fovkoyb
 
[HUN][hackersuli] Red Teaming alapok 2024
[HUN][hackersuli] Red Teaming alapok 2024[HUN][hackersuli] Red Teaming alapok 2024
[HUN][hackersuli] Red Teaming alapok 2024
hackersuli
 
Ready to Unlock the Power of Blockchain!
Ready to Unlock the Power of Blockchain!Ready to Unlock the Power of Blockchain!
Ready to Unlock the Power of Blockchain!
Toptal Tech
 
Design Thinking NETFLIX using all techniques.pptx
Design Thinking NETFLIX using all techniques.pptxDesign Thinking NETFLIX using all techniques.pptx
Design Thinking NETFLIX using all techniques.pptx
saathvikreddy2003
 
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
bseovas
 
HijackLoader Evolution: Interactive Process Hollowing
HijackLoader Evolution: Interactive Process HollowingHijackLoader Evolution: Interactive Process Hollowing
HijackLoader Evolution: Interactive Process Hollowing
Donato Onofri
 
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
uehowe
 
Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?
Paul Walk
 
Gen Z and the marketplaces - let's translate their needs
Gen Z and the marketplaces - let's translate their needsGen Z and the marketplaces - let's translate their needs
Gen Z and the marketplaces - let's translate their needs
Laura Szabó
 

Último (19)

怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
怎么办理(umiami毕业证书)美国迈阿密大学毕业证文凭证书实拍图原版一模一样
 
Discover the benefits of outsourcing SEO to India
Discover the benefits of outsourcing SEO to IndiaDiscover the benefits of outsourcing SEO to India
Discover the benefits of outsourcing SEO to India
 
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
快速办理(新加坡SMU毕业证书)新加坡管理大学毕业证文凭证书一模一样
 
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
快速办理(Vic毕业证书)惠灵顿维多利亚大学毕业证完成信一模一样
 
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
 
一比一原版(USYD毕业证)悉尼大学毕业证如何办理
一比一原版(USYD毕业证)悉尼大学毕业证如何办理一比一原版(USYD毕业证)悉尼大学毕业证如何办理
一比一原版(USYD毕业证)悉尼大学毕业证如何办理
 
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
 
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
办理新西兰奥克兰大学毕业证学位证书范本原版一模一样
 
manuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal
manuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaalmanuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal
manuaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal
 
办理毕业证(NYU毕业证)纽约大学毕业证成绩单官方原版办理
办理毕业证(NYU毕业证)纽约大学毕业证成绩单官方原版办理办理毕业证(NYU毕业证)纽约大学毕业证成绩单官方原版办理
办理毕业证(NYU毕业证)纽约大学毕业证成绩单官方原版办理
 
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
 
[HUN][hackersuli] Red Teaming alapok 2024
[HUN][hackersuli] Red Teaming alapok 2024[HUN][hackersuli] Red Teaming alapok 2024
[HUN][hackersuli] Red Teaming alapok 2024
 
Ready to Unlock the Power of Blockchain!
Ready to Unlock the Power of Blockchain!Ready to Unlock the Power of Blockchain!
Ready to Unlock the Power of Blockchain!
 
Design Thinking NETFLIX using all techniques.pptx
Design Thinking NETFLIX using all techniques.pptxDesign Thinking NETFLIX using all techniques.pptx
Design Thinking NETFLIX using all techniques.pptx
 
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
 
HijackLoader Evolution: Interactive Process Hollowing
HijackLoader Evolution: Interactive Process HollowingHijackLoader Evolution: Interactive Process Hollowing
HijackLoader Evolution: Interactive Process Hollowing
 
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
 
Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?
 
Gen Z and the marketplaces - let's translate their needs
Gen Z and the marketplaces - let's translate their needsGen Z and the marketplaces - let's translate their needs
Gen Z and the marketplaces - let's translate their needs
 

TGI Pulsar - episode 016: backlog vs storage size

  • 2. Question #1 Q1: Without retention settings, why storage size is much larger than msg backlog size?
  • 3. Question #2 Q2: Why does storage size keep growing while the msg backlog size is small?
  • 4. Question #3 Q3: What is the expected behavior when combining 100GB retention policy, 1GB backlog quota and a consumer with SubscriptionInitialPosition.Earliest setting? https://github.com/apache/pulsar/issues/7500
  • 7. Storage model #1 msgs are stored ONCE in a distributed log for each topic partition
  • 10. Storage model #1 Msgs are stored ONCE in a distributed log for each topic partition #2 Cursor stores the consumption state of a subscription
  • 12. Storage model #1 Msgs are stored ONCE in a distributed log for each topic partition #2 Cursor stores the consumption state of a subscription #3 Cursor == Offset + Individual Deletes. Acks update cursor.
  • 14. Storage model #1 Msgs are stored ONCE in a distributed log for each topic partition #2 Cursor stores the consumption state of a subscription #3 Cursor == Offset + Individual Deletes. Acks update cursor. #4 Msgs are READY to delete after subscriptions ack them.
  • 15. Backlog ✓ Subscription Backlog: All the unacked messages of a subscription ✓ Topic Backlog: The backlog of the slowest subscription
  • 16. Backlog Stats ✓ backlogSize: The total bytes of unacked messages ✓ msgBacklog: The total number of unacked entries ✓ Use `bin/pulsar-admin topics stats` to query the statistics of a topic https://github.com/apache/pulsar/issues/7484
  • 17. Storage model #1 Msgs are stored ONCE in a distributed log for each topic partition #2 Cursor stores the consumption state of a subscription #3 Cursor == Offset + Individual Deletes. Acks update cursor. #4 Msgs are READY to delete after subscriptions ack them. #5 The unacked msgs are “kept” in a subscription backlog.
  • 19. Backlog Retention Policy - Producer_request_hold - Producer_exception - consumer_backlog_eviction
  • 20. Backlog Quota Config ✓ Cluster config - backlogQuotaDefaultLimitGB - backlogQuotaDefaultRetentionPolicy - backlogQuotaCheckEnabled ✓ Namespace policy Usage: set-backlog-quota [options] tenant/namespace Options: * -l, --limit Size limit (eg: 10M, 16G) * -p, --policy Retention policy to enforce when the limit is reached. Valid options are: [producer_request_hold, producer_exception, consumer_backlog_eviction]
  • 21. Storage model #1 Msgs are stored ONCE in a distributed log for each topic partition #2 Cursor stores the consumption state of a subscription #3 Cursor == Offset + Individual Deletes. Acks update cursor. #4 Msgs are READY to delete after subscriptions ack them. #5 The unacked msgs are “kept” in a subscription backlog. #6 Backlog quota sets a CAP on unacked messages.
  • 22. TTL ✓ TTL defines the amount of time a message is allowed to stay in the unacknowledged state
  • 23. TTL Config ✓ Cluster config - ttlDurationDefaultInSeconds ✓ Namespace policy Set Message TTL for a namespace Usage: set-message-ttl [options] tenant/namespace Options: * --messageTTL, -ttl Message TTL in seconds Default: 0
  • 24. Storage model #1 Msgs are stored ONCE in a distributed log for each topic partition #2 Cursor stores the consumption state of a subscription #3 Cursor == Offset + Individual Deletes. Acks update cursor. #4 Msgs are READY to delete after subscriptions ack them. #5 The unacked msgs are “kept” in a subscription backlog. #6 Backlog quota sets a CAP on unacked messages. #7 TTL defines the time a msg can stay in the unacknowledged state
  • 25. Retention ✓ Retention Policy defines the amount of time acked messages would be kept before deletion
  • 26. Retention Config ✓ Cluster config - defaultRetentionTimeInMinutes - defaultRetentionSizeInMB ✓ Namespace policy Set the retention policy for a namespace Usage: set-retention [options] tenant/namespace Options: * --size, -s Retention size limit (eg: 10M, 16G, 3T). 0 or less than 1MB means no retention and -1 means infinite size retention * --time, -t Retention time in minutes (or minutes, hours,days,weeks eg: 100m, 3h, 2d, 5w). 0 means no retention and -1 means infinite time retention
  • 27. Storage model #1 Msgs are stored ONCE in a distributed log for each topic partition #2 Cursor stores the consumption state of a subscription #3 Cursor == Offset + Individual Deletes. Acks update cursor. #4 Msgs are READY to delete after subscriptions ack them. #5 The unacked msgs are “kept” in a subscription backlog. #6 Backlog quota sets a CAP on unacked messages. #7 TTL defines the time a msg can stay in the unacknowledged state #8 Retention Policy defines how to handle acked messages
  • 29. Storage Size ✓ Storage Size is the total amount of data that are not DELETED. ✓ Storage Size = Backlog Size + Retained Messages Size
  • 30. Storage Size ✓ Messages are not deleted one by one ✓ Messages are deleted segment by segment ✓ Messages pass retention period are kept because other messages in the same segment are in retention period
  • 31. Storage model #1 Msgs are stored ONCE in a distributed log for each topic partition #2 Cursor stores the consumption state of a subscription #3 Cursor == Offset + Individual Deletes. Acks update cursor. #4 Msgs are READY to delete after subscriptions ack them. #5 The unacked msgs are “kept” in a subscription backlog. #6 Backlog quota sets a CAP on unacked messages. #7 TTL defines the time a msg can stay in the unacknowledged state #8 Retention Policy defines how to handle acked messages #9 Messages are deleted segment by segment (not individually)
  • 32. Msg Backlog vs Storage Size #a backlog == storage size #b storage size >> backlog
  • 33. Troubleshooting ✓ `pulsar-admin topics stats-internal` ✓ The hole in `individuallyDeletedMessages` will cause storage size growing while backlog remains small ✓ `pulsar-admin topics unload` to trigger redelivery ✓ Set proper ackTimeout
  • 34. Storage Size vs Disk Usage
  • 35. Storage model #1 Msgs are stored ONCE in a distributed log for each topic partition #2 Cursor stores the consumption state of a subscription #3 Cursor == Offset + Individual Deletes. Acks update cursor. #4 Msgs are READY to delete after subscriptions ack them. #5 The unacked msgs are “kept” in a subscription backlog. #6 Backlog quota sets a CAP on unacked messages. #7 TTL defines the time a msg can stay in the unacknowledged state #8 Retention Policy defines how to handle acked messages #9 Messages are deleted segment by segment (not individually) #10 The disk space of DELETED msgs is reclaimed lazily
  • 36. Go to production ✓ Set backlog quota and policy ✓ Set TTL policy ✓ Set retention policy ✓ Tune settings related to storage size - managedLedgerMinLedgerRolloverTimeMinutes - managedLedgerMaxLedgerRolloverTimeMinutes - managedLedgerMaxSizePerLedgerMbytes ✓ Tune bookie GC settings - gcWaitTime - minorCompactionInterval - majorCompactionThreshold