SlideShare una empresa de Scribd logo
1 de 39
Descargar para leer sin conexión
© 2019 SPLUNK INC.
The Next Generation
Messaging and Queuing
System
© 2019 SPLUNK INC.
Intro
Senior Principal Engineer - Splunk
Co-creator Apache Pulsar
Matteo Merli
Senior Director of Engineering - Splunk
Karthik Ramasamy
© 2019 SPLUNK INC.
Messaging and Streaming
© 2019 SPLUNK INC.
Messaging
Message passing between
components, application,
services
© 2019 SPLUNK INC.
Streaming
Analyze events that just
happened
© 2019 SPLUNK INC.
Messaging vs Streaming
2 worlds, 1 infra
© 2019 SPLUNK INC.
Use cases
● OLTP, Integration
● Main challenges:
○ Latency
○ Availability
○ Data durability
○ High level features
■ Routing, DLQ, delays, individual acks
● Real-time analytics
● Main challenges:
○ Throughput
○ Ordering
○ Stateful processing
○ Batch + Real-Time
Messaging Streaming
© 2019 SPLUNK INC.
Storage
Messaging
Compute
© 2019 SPLUNK INC.
Apache Pulsar
Data replicated
and synced to
disk
Durability
Low publish
latency of 5ms at
99pct
Low
Latency
Can reach 1.8 M
messages/s in a
single partition
High
Throughput
System is
available if any 2
nodes are up
High
Availability
Take advantage
of dynamic
cluster scaling in
cloud
environments
Cloud
Native
Flexible Pub-Sub and Compute backed by durable log storage
© 2019 SPLUNK INC.
Apache Pulsar
Support both
Topic & Queue
semantic in a
single model
Unified
messaging
model
Can support
millions of topics
Highly
Scalable
Lightweight
compute
framework based
on functions
Native
Compute
Supports multiple
users and
workloads in a
single cluster
Multi
Tenant
Out of box
support for
geographically
distributed
applications
Geo
Replication
Flexible Pub-Sub and Compute backed by durable log storage
© 2019 SPLUNK INC.
Apache Pulsar project in numbers
192
Contributors
30
Committers
100s
Adopters
4.6K
Github Stars
© 2019 SPLUNK INC.
Sample of Pulsar users and contributors
© 2019 SPLUNK INC.
Messaging Model
© 2019 SPLUNK INC.
Pulsar Client libraries
● Java — C++ — C — Python — Go — NodeJS — WebSocket APIs
● Partitioned topics
● Apache Kafka compatibility wrapper API
● Transparent batching and compression
● TLS encryption and authentication
● End-to-end encryption
© 2019 SPLUNK INC.
Architectural view
Separate layers between
brokers bookies
● Broker and bookies can
be added independently
● Traffic can be shifted very
quickly across brokers
● New bookies will ramp up
on traffic quickly
© 2019 SPLUNK INC.
Apache BookKeeper
● Low-latency durable writes
● Simple repeatable read
consistency
● Highly available
● Store many logs per node
● I/O Isolation
Replicated log storage
© 2019 SPLUNK INC.
Inside
BookKeeper
Storage optimized for
sequential & immutable data
● IO isolation between write and read
operations
● Does not rely on OS page cache
● Slow consumers won’t impact latency
● Very effective IO patterns:
○ Journal — append only and no reads
○ Storage device — bulk write and
sequential reads
● Number of files is independent from number
of topics
© 2019 SPLUNK INC.
Segment
Centric
Storage
In addition to partitioning, messages are stored
in segments (based on time and size)
Segments are independent from each others and
spread across all storage nodes
© 2019 SPLUNK INC.
Segments vs Partitions
© 2019 SPLUNK INC.
Tiered
Storage
Unlimited topic storage capacity
Achieves the true “stream-storage”: keep the raw
data forever in stream form
Extremely cost effective
© 2019 SPLUNK INC.
Schema Registry
Store information on the data structure — Stored in BookKeeper
Enforce data types on topic
Allow for compatible schema evolutions
© 2019 SPLUNK INC.
Schema Registry
● Integrated schema in API
● End-to-end type safety — Enforced in Pulsar broker
Producer<MyClass> producer = client
.newProducer(Schema.JSON(MyClass.class))
.topic("my-topic")
.create();
producer.send(new MyClass(1, 2));
Consumer<MyClass> consumer = client
.newConsumer(Schema.JSON(MyClass.class))
.topic("my-topic")
.subscriptionName("my-subscription")
.subscribe();
Message<MyClass> msg = consumer.receive();
Type Safe API
© 2019 SPLUNK INC.
Geo
Replication
Scalable asynchronous replication
Integrated in the broker message flow
Simple configuration to add/remove regions
© 2019 SPLUNK INC.
Replicated Subscriptions
● Consumption will restart close to where a consumer left off - Small amount of dups
● Implementation
○ Use markers injected into the data flow
○ Create a consistent snapshot of message ids across cluster
○ Establish a relationship: If consumed MA-1 in Cluster-A it must have consumed
MB-2 in Cluster-B
Migrate subscriptions across geo-replicated clusters
© 2019 SPLUNK INC.
Multi-Tenancy
● Authentication / Authorization / Namespaces / Admin APIs
● I/O Isolations between writes and reads
○ Provided by BookKeeper
○ Ensure readers draining backlog won’t affect publishers
● Soft isolation
○ Storage quotas — flow-control — back-pressure — rate limiting
● Hardware isolation
○ Constrain some tenants on a subset of brokers or bookies
A single Pulsar cluster supports multiple users and mixed workloads
© 2019 SPLUNK INC.
Lightweight Compute with
Pulsar Functions
© 2019 SPLUNK INC.
Pulsar Functions
© 2019 SPLUNK INC.
Pulsar Functions
● User supplied compute against a
consumed message
○ ETL, data enrichment, filtering, routing
● Simplest possible API
○ Use language specific “function” notation
○ No SDK required
○ SDK available for more advanced
features (state, metrics, logging, …)
● Language agnostic
○ Java, Python and Go
○ Easy to support more languages
● Pluggable runtime
○ Managed or manual deployment
○ Run as threads, processes or containers
in Kubernetes
© 2019 SPLUNK INC.
Pulsar Functions
def process(input):
return input + '!'
import java.util.function.Function;
public class ExclamationFunction
implements Function<String, String> {
@Override
public String apply(String input) {
return input + "!";
}
}
Python Java
Examples
© 2019 SPLUNK INC.
Pulsar Functions
● Functions can store state in stream storage
● State is global and replicated
● Multiple instances of the same function can access the same state
● Functions framework provides simple abstraction over state
State management
© 2019 SPLUNK INC.
Pulsar Functions
● Implemented on top of Apache BookKeeper “Table Service”
● BookKeeper provides a sharded key/value store based on:
○ Log & Snapshot - Stored as BookKeeper ledgers
○ Warm replicas that can be quickly promoted to leader
● In case of leader failure there is no downtime or huge log to replay
State management
© 2019 SPLUNK INC.
Pulsar Functions
State example
import org.apache.pulsar.functions.api.Context;
import org.apache.pulsar.functions.api.PulsarFunction;
public class CounterFunction
implements PulsarFunction<String, Void> {
@Override
public Void process(String input, Context context) {
for (String word : input.split(".")) {
context.incrCounter(word, 1);
}
return null;
}
}
© 2019 SPLUNK INC.
Pulsar IO
Connectors Framework based on Pulsar Functions
© 2019 SPLUNK INC.
Built-in Pulsar IO connectors
© 2019 SPLUNK INC.
Querying data stored
in Pulsar
© 2019 SPLUNK INC.
Pulsar SQL
● Uses Presto for interactive SQL
queries over data stored in Pulsar
● Query historic and real-time data
● Integrated with schema registry
● Can join with data from other
sources
© 2019 SPLUNK INC.
Pulsar SQL
● Read data directly from BookKeeper into Presto — bypass Pulsar Broker
● Many-to-many data reads
○ Data is split even on a single partition — multiple workers can read data in parallel from single
Pulsar partition
● Time based indexing — Use “publishTime” in predicates to reduce data being read
from disk
© 2019 SPLUNK INC.
Pulsar Storage API
● Work in progress to allow direct access to data stored in Pulsar
● Generalization of the work done for Presto connector
● Most efficient way to retrieve and process data from “batch” execution engines
Thank You
© 2019 SPLUNK INC.

Más contenido relacionado

La actualidad más candente

Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 

La actualidad más candente (20)

Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
 
SeaweedFS introduction
SeaweedFS introductionSeaweedFS introduction
SeaweedFS introduction
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and PluginsMonitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper Optimization
 

Similar a Apache Pulsar: The Next Generation Messaging and Queuing System

Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Timothy Spann
 

Similar a Apache Pulsar: The Next Generation Messaging and Queuing System (20)

Interactive querying of streams using Apache Pulsar_Jerry peng
Interactive querying of streams using Apache Pulsar_Jerry pengInteractive querying of streams using Apache Pulsar_Jerry peng
Interactive querying of streams using Apache Pulsar_Jerry peng
 
Scaling Apache Pulsar to 10 Petabytes/Day - Pulsar Summit NA 2021 Keynote
Scaling Apache Pulsar to 10 Petabytes/Day - Pulsar Summit NA 2021 KeynoteScaling Apache Pulsar to 10 Petabytes/Day - Pulsar Summit NA 2021 Keynote
Scaling Apache Pulsar to 10 Petabytes/Day - Pulsar Summit NA 2021 Keynote
 
Apache Pulsar @Splunk
Apache Pulsar @SplunkApache Pulsar @Splunk
Apache Pulsar @Splunk
 
Redpanda and ClickHouse
Redpanda and ClickHouseRedpanda and ClickHouse
Redpanda and ClickHouse
 
Splunk Cloud and Splunk Enterprise 7.2
Splunk Cloud and Splunk Enterprise 7.2 Splunk Cloud and Splunk Enterprise 7.2
Splunk Cloud and Splunk Enterprise 7.2
 
Splunk Cloud and Splunk Enterprise 7.2
Splunk Cloud and Splunk Enterprise 7.2 Splunk Cloud and Splunk Enterprise 7.2
Splunk Cloud and Splunk Enterprise 7.2
 
Splunk Cloud and Splunk Enterprise 7.2
Splunk Cloud and Splunk Enterprise 7.2Splunk Cloud and Splunk Enterprise 7.2
Splunk Cloud and Splunk Enterprise 7.2
 
What's New with the Latest Splunk Platform Release
What's New with the Latest Splunk Platform ReleaseWhat's New with the Latest Splunk Platform Release
What's New with the Latest Splunk Platform Release
 
Pulsar summit-keynote-final
Pulsar summit-keynote-finalPulsar summit-keynote-final
Pulsar summit-keynote-final
 
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBDistributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
 
Pulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platformPulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platform
 
Alle Neuigkeiten im letzten Plattform Release
Alle Neuigkeiten im letzten Plattform ReleaseAlle Neuigkeiten im letzten Plattform Release
Alle Neuigkeiten im letzten Plattform Release
 
Apache Kafka
Apache Kafka Apache Kafka
Apache Kafka
 
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBMPowering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
 
iovlabs.pdf
iovlabs.pdfiovlabs.pdf
iovlabs.pdf
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
 
Event driven architectures with Kinesis
Event driven architectures with KinesisEvent driven architectures with Kinesis
Event driven architectures with Kinesis
 
AWS re:Invent 2016 - Scality's Open Source AWS S3 Server
AWS re:Invent 2016 - Scality's Open Source AWS S3 ServerAWS re:Invent 2016 - Scality's Open Source AWS S3 Server
AWS re:Invent 2016 - Scality's Open Source AWS S3 Server
 
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
 

Más de Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

Más de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Último

如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
vexqp
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
vexqp
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 

Último (20)

The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 

Apache Pulsar: The Next Generation Messaging and Queuing System

  • 1. © 2019 SPLUNK INC. The Next Generation Messaging and Queuing System
  • 2. © 2019 SPLUNK INC. Intro Senior Principal Engineer - Splunk Co-creator Apache Pulsar Matteo Merli Senior Director of Engineering - Splunk Karthik Ramasamy
  • 3. © 2019 SPLUNK INC. Messaging and Streaming
  • 4. © 2019 SPLUNK INC. Messaging Message passing between components, application, services
  • 5. © 2019 SPLUNK INC. Streaming Analyze events that just happened
  • 6. © 2019 SPLUNK INC. Messaging vs Streaming 2 worlds, 1 infra
  • 7. © 2019 SPLUNK INC. Use cases ● OLTP, Integration ● Main challenges: ○ Latency ○ Availability ○ Data durability ○ High level features ■ Routing, DLQ, delays, individual acks ● Real-time analytics ● Main challenges: ○ Throughput ○ Ordering ○ Stateful processing ○ Batch + Real-Time Messaging Streaming
  • 8. © 2019 SPLUNK INC. Storage Messaging Compute
  • 9. © 2019 SPLUNK INC. Apache Pulsar Data replicated and synced to disk Durability Low publish latency of 5ms at 99pct Low Latency Can reach 1.8 M messages/s in a single partition High Throughput System is available if any 2 nodes are up High Availability Take advantage of dynamic cluster scaling in cloud environments Cloud Native Flexible Pub-Sub and Compute backed by durable log storage
  • 10. © 2019 SPLUNK INC. Apache Pulsar Support both Topic & Queue semantic in a single model Unified messaging model Can support millions of topics Highly Scalable Lightweight compute framework based on functions Native Compute Supports multiple users and workloads in a single cluster Multi Tenant Out of box support for geographically distributed applications Geo Replication Flexible Pub-Sub and Compute backed by durable log storage
  • 11. © 2019 SPLUNK INC. Apache Pulsar project in numbers 192 Contributors 30 Committers 100s Adopters 4.6K Github Stars
  • 12. © 2019 SPLUNK INC. Sample of Pulsar users and contributors
  • 13. © 2019 SPLUNK INC. Messaging Model
  • 14. © 2019 SPLUNK INC. Pulsar Client libraries ● Java — C++ — C — Python — Go — NodeJS — WebSocket APIs ● Partitioned topics ● Apache Kafka compatibility wrapper API ● Transparent batching and compression ● TLS encryption and authentication ● End-to-end encryption
  • 15. © 2019 SPLUNK INC. Architectural view Separate layers between brokers bookies ● Broker and bookies can be added independently ● Traffic can be shifted very quickly across brokers ● New bookies will ramp up on traffic quickly
  • 16. © 2019 SPLUNK INC. Apache BookKeeper ● Low-latency durable writes ● Simple repeatable read consistency ● Highly available ● Store many logs per node ● I/O Isolation Replicated log storage
  • 17. © 2019 SPLUNK INC. Inside BookKeeper Storage optimized for sequential & immutable data ● IO isolation between write and read operations ● Does not rely on OS page cache ● Slow consumers won’t impact latency ● Very effective IO patterns: ○ Journal — append only and no reads ○ Storage device — bulk write and sequential reads ● Number of files is independent from number of topics
  • 18. © 2019 SPLUNK INC. Segment Centric Storage In addition to partitioning, messages are stored in segments (based on time and size) Segments are independent from each others and spread across all storage nodes
  • 19. © 2019 SPLUNK INC. Segments vs Partitions
  • 20. © 2019 SPLUNK INC. Tiered Storage Unlimited topic storage capacity Achieves the true “stream-storage”: keep the raw data forever in stream form Extremely cost effective
  • 21. © 2019 SPLUNK INC. Schema Registry Store information on the data structure — Stored in BookKeeper Enforce data types on topic Allow for compatible schema evolutions
  • 22. © 2019 SPLUNK INC. Schema Registry ● Integrated schema in API ● End-to-end type safety — Enforced in Pulsar broker Producer<MyClass> producer = client .newProducer(Schema.JSON(MyClass.class)) .topic("my-topic") .create(); producer.send(new MyClass(1, 2)); Consumer<MyClass> consumer = client .newConsumer(Schema.JSON(MyClass.class)) .topic("my-topic") .subscriptionName("my-subscription") .subscribe(); Message<MyClass> msg = consumer.receive(); Type Safe API
  • 23. © 2019 SPLUNK INC. Geo Replication Scalable asynchronous replication Integrated in the broker message flow Simple configuration to add/remove regions
  • 24. © 2019 SPLUNK INC. Replicated Subscriptions ● Consumption will restart close to where a consumer left off - Small amount of dups ● Implementation ○ Use markers injected into the data flow ○ Create a consistent snapshot of message ids across cluster ○ Establish a relationship: If consumed MA-1 in Cluster-A it must have consumed MB-2 in Cluster-B Migrate subscriptions across geo-replicated clusters
  • 25. © 2019 SPLUNK INC. Multi-Tenancy ● Authentication / Authorization / Namespaces / Admin APIs ● I/O Isolations between writes and reads ○ Provided by BookKeeper ○ Ensure readers draining backlog won’t affect publishers ● Soft isolation ○ Storage quotas — flow-control — back-pressure — rate limiting ● Hardware isolation ○ Constrain some tenants on a subset of brokers or bookies A single Pulsar cluster supports multiple users and mixed workloads
  • 26. © 2019 SPLUNK INC. Lightweight Compute with Pulsar Functions
  • 27. © 2019 SPLUNK INC. Pulsar Functions
  • 28. © 2019 SPLUNK INC. Pulsar Functions ● User supplied compute against a consumed message ○ ETL, data enrichment, filtering, routing ● Simplest possible API ○ Use language specific “function” notation ○ No SDK required ○ SDK available for more advanced features (state, metrics, logging, …) ● Language agnostic ○ Java, Python and Go ○ Easy to support more languages ● Pluggable runtime ○ Managed or manual deployment ○ Run as threads, processes or containers in Kubernetes
  • 29. © 2019 SPLUNK INC. Pulsar Functions def process(input): return input + '!' import java.util.function.Function; public class ExclamationFunction implements Function<String, String> { @Override public String apply(String input) { return input + "!"; } } Python Java Examples
  • 30. © 2019 SPLUNK INC. Pulsar Functions ● Functions can store state in stream storage ● State is global and replicated ● Multiple instances of the same function can access the same state ● Functions framework provides simple abstraction over state State management
  • 31. © 2019 SPLUNK INC. Pulsar Functions ● Implemented on top of Apache BookKeeper “Table Service” ● BookKeeper provides a sharded key/value store based on: ○ Log & Snapshot - Stored as BookKeeper ledgers ○ Warm replicas that can be quickly promoted to leader ● In case of leader failure there is no downtime or huge log to replay State management
  • 32. © 2019 SPLUNK INC. Pulsar Functions State example import org.apache.pulsar.functions.api.Context; import org.apache.pulsar.functions.api.PulsarFunction; public class CounterFunction implements PulsarFunction<String, Void> { @Override public Void process(String input, Context context) { for (String word : input.split(".")) { context.incrCounter(word, 1); } return null; } }
  • 33. © 2019 SPLUNK INC. Pulsar IO Connectors Framework based on Pulsar Functions
  • 34. © 2019 SPLUNK INC. Built-in Pulsar IO connectors
  • 35. © 2019 SPLUNK INC. Querying data stored in Pulsar
  • 36. © 2019 SPLUNK INC. Pulsar SQL ● Uses Presto for interactive SQL queries over data stored in Pulsar ● Query historic and real-time data ● Integrated with schema registry ● Can join with data from other sources
  • 37. © 2019 SPLUNK INC. Pulsar SQL ● Read data directly from BookKeeper into Presto — bypass Pulsar Broker ● Many-to-many data reads ○ Data is split even on a single partition — multiple workers can read data in parallel from single Pulsar partition ● Time based indexing — Use “publishTime” in predicates to reduce data being read from disk
  • 38. © 2019 SPLUNK INC. Pulsar Storage API ● Work in progress to allow direct access to data stored in Pulsar ● Generalization of the work done for Presto connector ● Most efficient way to retrieve and process data from “batch” execution engines
  • 39. Thank You © 2019 SPLUNK INC.