SlideShare una empresa de Scribd logo
1 de 49
Jessy Jordan, Sta
ff
Software Engineer at Meroxa, @jayjayjpg
Building on top of Kafka
How Meroxa leveraged Kafka to build a Stream Processing
Application Platform
Apache, Apache Kafka, Kafka and the Kafka logo are trademarks of the Apache Software Foundation. The Apache Software Foundation has no a
ffi
liation with
and does not endorse the materials provided at this event.
@meroxadata
🤔
@meroxadata
If you wanted to build a stream processing application
platform…
🤔
…that allowed you to process real-time data e
ffi
ciently for
hundreds and thousands of users….
@meroxadata
🤔
Including a great UX: wouldn’t you want to build on the
shoulders of giants, such as Apache Kafka?
@meroxadata
What is a Stream Processing Application Platform?
Allowing users to sync,
transform and persist
real-time data
Application Platform →
code-
fi
rst user interface
@meroxadata
What is a Stream Processing Application Platform?
@meroxadata
Building an Stream Processing Application Platform
Why use Kafka?
@meroxadata
Real-time stream processing essential for modern data engineering
Vast majority of Fortune 100
companies already relies on
event streaming to make use
of real-time data
Mission to build tools
equipped for real-time data
use cases
→ Kafka as de facto standard
for (event) streaming
https://kafka.apache.org/
@meroxadata
Building a Streaming Platform as a Service:
Why use Kafka?
Robust
Scalable
Easy to observe
Easy to extend
@meroxadata
Building a Streaming Platform as a Service:
Why use Kafka?
Robust
Scalable
Easy to observe
Easy to extend
Team expertise working
with Kafka
@meroxadata
Apache Kafka as integral part of the Meroxa platform
Technology Stack
@meroxadata
Control Plane Data Plane
REST API
Provisioner
Microservice
MSK
CRD
CRD
CRD CRD
Controller
Microservice
Kafka
Connect
@meroxadata
Orchestrating Data Applications as part of the Meroxa
platform
Kubernetes Operator
@meroxadata
Control Plane Data Plane
REST API
Provisioner
Microservice
MSK
CRD
CRD
CRD CRD
Controller
Microservice
Kafka
Connect
@meroxadata
Provisioning
Reliance on AWS Cloud Services
Provisioning of Data Plane via AWS
Cloudformation, incl.
Managed Kubernetes Service (EKS)
Fully Managed Apache Kafka on AWS
(MSK)
AWS S3, ECR (Streaming App Build)
Data Plane provisioning to external end user
VPC possible
@meroxadata
Kafka cluster setup and scaling
Setup: 8 partitions, 3 replications, brokers
across several AZs
Horizontal + Vertical scaling of Kafka
Connect pods
Fully managed Kafka cluster
Time + Cost-e
ffi
ciency
Ability to focus on product development,
instead of data infrastructure operation
@meroxadata
Challenges of using Cloud-Hosted Kafka
Up- and down-scaling of storage limited
Still some operational burden for Kafka that
goes into maintenance in contrast to Cloud
Native
upgrades
scaling
monitoring
Limitation: air-gapped environments for
external deployment
@meroxadata
Deploy, destroy and modify Kafka connectors
Connector CRDs - platform
connectors based on Kafka Connect
Other CRDs also used for custom
connectors (Conduit)
Controllers create, delete, modify
connectors
Data Plane
MSK
CRD
CRD
CRD CRD
Controller
Microservice
Kafka
Connect
@meroxadata
@meroxadata
@meroxadata
CLI allows user to initialize data app as a local code repository (ex: Go)
@meroxadata
User deploys data application via the CLI
@meroxadata
Control Plane Data Plane
REST API
Provisioner
Microservice
Controller
Microservice
CRD CRD
CRD
Kafka
Connect
MSK?
HTTP Request
POST /applications
@meroxadata
Control Plane Data Plane
REST API
Provisioner
Microservice
Controller
Microservice
CRD CRD
CRD CRD
Kafka
Connect
MSK?
POST /applications
{
"spec": {
"connectors": [{
"collection": "user_activity",
"type": "source",
"resource": "my-postgres",
"con
fi
g": {
"logical_replication": true
}
},
{
"collection": "user_activity_enriched",
"type": "destination",
"resource": "my-kafka-cluster",
}],
"functions": [
{
"name": "user_activity_enriched",
"image": "ftorres/enrich:9"
}
],
"metadata": {
"turbine": {
"language": "go",
"version": "0.1.0",
},
}
}
}
Client sends an intermediary representation of the user’s data
application to the server
@meroxadata
Control Plane Data Plane
REST API
Provisioner
Microservice
MSK
CRD
CRD
CRD CRD
Controller
Microservice
Kafka
Connect
HTTP Request
@meroxadata
Control Plane Data Plane
REST API
Provisioner
Microservice
+
Data Plane
API
Controller
Microservice
CRD CRD
CRD CRD
Kafka
Connect
MSK?
HTTP Request
Upsert custom
resources
@meroxadata
Using Kubernetes to deploy, destroy and modify Kafka connectors
Controller
Microservice
Connector
Custom
Resource
De
fi
nition
Kafka Connect
MSK
Upsert custom
resources
Connector Controller
@meroxadata
Expanding Functionality of the Meroxa Platform
Extensibility
@meroxadata
Apache Kafka connector ecosystem
Apache Kafka as OSS promotes shared & faster development of compatible data integrations
120+ pre-built 🔌
Con
fl
uent
9 🔌
Debezium
Custom
(Kafka Connect)
@meroxadata
Extending the platform with community and custom connectors
Meroxa platform uses:
@meroxadata
Extending the platform with community and custom connectors
Meroxa platform uses:
Debezium connectors
@meroxadata
Extending the platform with community and custom connectors
Meroxa platform uses:
Debezium connectors
Custom Kafka Connect
connectors
@meroxadata
Apache Kafka connector ecosystem
Apache Kafka as OSS promotes shared & faster development of compatible data integrations
Con
fl
uent Debezium Kafka Connect
@meroxadata
Conduit connector ecosystem
Conduit as data integration OSS with its own connector ecosystem
Connector SDK (technically
language agnostic framework)
gRPC interface
OpenCDC Schema Format
@meroxadata
Conduit as alternative data connector framework
Enabling Kafka Connect - Conduit connector data pipelines
@meroxadata
Conduit as alternative data connector framework
Enabling Kafka Connect - Conduit connector data pipelines
@meroxadata
Extending the platform with community and custom connectors
Meroxa platform uses:
Debezium connectors
Custom Kafka Connect
connectors
Custom connectors
(Conduit)
@meroxadata
Extending the platform with community and custom connectors
Apache Kafka with open-source
connector ecosystem
Debezium providing open-source
platform for CDC, incl.
connectors
Conduit with its own connector
ecosystem
Including connectors integrating
back to end user’s Kafka clusters
https://conduit.io/
@meroxadata
Monitoring Streaming on the Meroxa Platform
Observability
@meroxadata
Observability for Meroxa platform end users
@meroxadata
Observability for Meroxa platform end users: Connector State
Running
Failed
Pending
@meroxadata
Observability for Meroxa platform end users: Connector state
Controller Microservice: Connector
Controller polls for connector
status
Running
Failed
Pending
Controller
Microservice
Connector
Custom
Resource
De
fi
nition
Kafka Connect
Read state custom
resources
Connector Controller
Conduit server
Conduit Controllers
@meroxadata
Observability for Meroxa platform end users: Connector Logs
@meroxadata
Observability for Meroxa platform end users
@meroxadata
Observability for Meroxa platform end users
Simple aggregation
and
formatting
@meroxadata
How we monitor Kafka cluster and connectors
MSK metrics tracked with Prometheus Data
Plane instance
Transfer of metrics across plane
components with multiple Prometheus
instances
Aggregation of metrics in DataDog
→Prometheus as cost-e
ffi
cient metric tool
(open-source)
@meroxadata
Sum Up
@meroxadata
How Meroxa leveraged Kafka to build a Stream Processing Application
Platform
Why Kafka?
Scalable, robust and well-supported foundation for building
modern data engineering software
Creating scalable data infrastructure using Kafka
Extensibility of our platform with Debezium, Kafka Connect and
Conduit
Internal and end-user observability with Kafka Connect and logging
+ metrics tooling
@meroxadata
Thank you!
@meroxadata
discord.meroxa.com
🌐 meroxa.com
@meroxadata

Más contenido relacionado

Similar a Leverage Kafka to build a stream processing platform

Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
Au delà des brokers, un tour de l’environnement Kafka | Florent RamièreAu delà des brokers, un tour de l’environnement Kafka | Florent Ramière
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramièreconfluent
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analyticsconfluent
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleEvan Chan
 
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...HostedbyConfluent
 
Streaming the platform with Confluent (Apache Kafka)
Streaming the platform with Confluent (Apache Kafka)Streaming the platform with Confluent (Apache Kafka)
Streaming the platform with Confluent (Apache Kafka)GiuseppeBaccini
 
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...StreamNative
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingTimothy Spann
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaGuido Schmutz
 
OSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsTimothy Spann
 
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!confluent
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams APIconfluent
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comCedric Vidal
 
Apache Kafka - A Distributed Streaming Platform
Apache Kafka - A Distributed Streaming PlatformApache Kafka - A Distributed Streaming Platform
Apache Kafka - A Distributed Streaming PlatformPaolo Castagna
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformconfluent
 
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyConfluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyKairo Tavares
 
Building Real-Time Travel Alerts
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel AlertsTimothy Spann
 
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaKafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaGuido Schmutz
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache KafkaDataStax
 
Fault Tolerance with Kafka
Fault Tolerance with KafkaFault Tolerance with Kafka
Fault Tolerance with KafkaEdureka!
 

Similar a Leverage Kafka to build a stream processing platform (20)

Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
Au delà des brokers, un tour de l’environnement Kafka | Florent RamièreAu delà des brokers, un tour de l’environnement Kafka | Florent Ramière
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
 
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
 
Streaming the platform with Confluent (Apache Kafka)
Streaming the platform with Confluent (Apache Kafka)Streaming the platform with Confluent (Apache Kafka)
Streaming the platform with Confluent (Apache Kafka)
 
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
 
OSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
 
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
 
Apache Kafka - A Distributed Streaming Platform
Apache Kafka - A Distributed Streaming PlatformApache Kafka - A Distributed Streaming Platform
Apache Kafka - A Distributed Streaming Platform
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platform
 
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyConfluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
 
Building Real-Time Travel Alerts
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
 
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaKafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Fault Tolerance with Kafka
Fault Tolerance with KafkaFault Tolerance with Kafka
Fault Tolerance with Kafka
 

Más de confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 
The Journey to Data Mesh with Confluent
The Journey to Data Mesh with ConfluentThe Journey to Data Mesh with Confluent
The Journey to Data Mesh with Confluentconfluent
 

Más de confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 
The Journey to Data Mesh with Confluent
The Journey to Data Mesh with ConfluentThe Journey to Data Mesh with Confluent
The Journey to Data Mesh with Confluent
 

Último

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 

Último (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Leverage Kafka to build a stream processing platform

  • 1. Jessy Jordan, Sta ff Software Engineer at Meroxa, @jayjayjpg Building on top of Kafka How Meroxa leveraged Kafka to build a Stream Processing Application Platform Apache, Apache Kafka, Kafka and the Kafka logo are trademarks of the Apache Software Foundation. The Apache Software Foundation has no a ffi liation with and does not endorse the materials provided at this event. @meroxadata
  • 2. 🤔 @meroxadata If you wanted to build a stream processing application platform…
  • 3. 🤔 …that allowed you to process real-time data e ffi ciently for hundreds and thousands of users…. @meroxadata
  • 4. 🤔 Including a great UX: wouldn’t you want to build on the shoulders of giants, such as Apache Kafka? @meroxadata
  • 5. What is a Stream Processing Application Platform? Allowing users to sync, transform and persist real-time data Application Platform → code- fi rst user interface @meroxadata
  • 6. What is a Stream Processing Application Platform? @meroxadata
  • 7. Building an Stream Processing Application Platform Why use Kafka? @meroxadata
  • 8. Real-time stream processing essential for modern data engineering Vast majority of Fortune 100 companies already relies on event streaming to make use of real-time data Mission to build tools equipped for real-time data use cases → Kafka as de facto standard for (event) streaming https://kafka.apache.org/ @meroxadata
  • 9. Building a Streaming Platform as a Service: Why use Kafka? Robust Scalable Easy to observe Easy to extend @meroxadata
  • 10. Building a Streaming Platform as a Service: Why use Kafka? Robust Scalable Easy to observe Easy to extend Team expertise working with Kafka @meroxadata
  • 11. Apache Kafka as integral part of the Meroxa platform Technology Stack @meroxadata
  • 12. Control Plane Data Plane REST API Provisioner Microservice MSK CRD CRD CRD CRD Controller Microservice Kafka Connect @meroxadata
  • 13. Orchestrating Data Applications as part of the Meroxa platform Kubernetes Operator @meroxadata
  • 14. Control Plane Data Plane REST API Provisioner Microservice MSK CRD CRD CRD CRD Controller Microservice Kafka Connect @meroxadata
  • 15. Provisioning Reliance on AWS Cloud Services Provisioning of Data Plane via AWS Cloudformation, incl. Managed Kubernetes Service (EKS) Fully Managed Apache Kafka on AWS (MSK) AWS S3, ECR (Streaming App Build) Data Plane provisioning to external end user VPC possible @meroxadata
  • 16. Kafka cluster setup and scaling Setup: 8 partitions, 3 replications, brokers across several AZs Horizontal + Vertical scaling of Kafka Connect pods Fully managed Kafka cluster Time + Cost-e ffi ciency Ability to focus on product development, instead of data infrastructure operation @meroxadata
  • 17. Challenges of using Cloud-Hosted Kafka Up- and down-scaling of storage limited Still some operational burden for Kafka that goes into maintenance in contrast to Cloud Native upgrades scaling monitoring Limitation: air-gapped environments for external deployment @meroxadata
  • 18. Deploy, destroy and modify Kafka connectors Connector CRDs - platform connectors based on Kafka Connect Other CRDs also used for custom connectors (Conduit) Controllers create, delete, modify connectors Data Plane MSK CRD CRD CRD CRD Controller Microservice Kafka Connect @meroxadata
  • 21. CLI allows user to initialize data app as a local code repository (ex: Go) @meroxadata
  • 22. User deploys data application via the CLI @meroxadata
  • 23. Control Plane Data Plane REST API Provisioner Microservice Controller Microservice CRD CRD CRD Kafka Connect MSK? HTTP Request POST /applications @meroxadata
  • 24. Control Plane Data Plane REST API Provisioner Microservice Controller Microservice CRD CRD CRD CRD Kafka Connect MSK? POST /applications { "spec": { "connectors": [{ "collection": "user_activity", "type": "source", "resource": "my-postgres", "con fi g": { "logical_replication": true } }, { "collection": "user_activity_enriched", "type": "destination", "resource": "my-kafka-cluster", }], "functions": [ { "name": "user_activity_enriched", "image": "ftorres/enrich:9" } ], "metadata": { "turbine": { "language": "go", "version": "0.1.0", }, } } } Client sends an intermediary representation of the user’s data application to the server @meroxadata
  • 25. Control Plane Data Plane REST API Provisioner Microservice MSK CRD CRD CRD CRD Controller Microservice Kafka Connect HTTP Request @meroxadata
  • 26. Control Plane Data Plane REST API Provisioner Microservice + Data Plane API Controller Microservice CRD CRD CRD CRD Kafka Connect MSK? HTTP Request Upsert custom resources @meroxadata
  • 27. Using Kubernetes to deploy, destroy and modify Kafka connectors Controller Microservice Connector Custom Resource De fi nition Kafka Connect MSK Upsert custom resources Connector Controller @meroxadata
  • 28. Expanding Functionality of the Meroxa Platform Extensibility @meroxadata
  • 29. Apache Kafka connector ecosystem Apache Kafka as OSS promotes shared & faster development of compatible data integrations 120+ pre-built 🔌 Con fl uent 9 🔌 Debezium Custom (Kafka Connect) @meroxadata
  • 30. Extending the platform with community and custom connectors Meroxa platform uses: @meroxadata
  • 31. Extending the platform with community and custom connectors Meroxa platform uses: Debezium connectors @meroxadata
  • 32. Extending the platform with community and custom connectors Meroxa platform uses: Debezium connectors Custom Kafka Connect connectors @meroxadata
  • 33. Apache Kafka connector ecosystem Apache Kafka as OSS promotes shared & faster development of compatible data integrations Con fl uent Debezium Kafka Connect @meroxadata
  • 34. Conduit connector ecosystem Conduit as data integration OSS with its own connector ecosystem Connector SDK (technically language agnostic framework) gRPC interface OpenCDC Schema Format @meroxadata
  • 35. Conduit as alternative data connector framework Enabling Kafka Connect - Conduit connector data pipelines @meroxadata
  • 36. Conduit as alternative data connector framework Enabling Kafka Connect - Conduit connector data pipelines @meroxadata
  • 37. Extending the platform with community and custom connectors Meroxa platform uses: Debezium connectors Custom Kafka Connect connectors Custom connectors (Conduit) @meroxadata
  • 38. Extending the platform with community and custom connectors Apache Kafka with open-source connector ecosystem Debezium providing open-source platform for CDC, incl. connectors Conduit with its own connector ecosystem Including connectors integrating back to end user’s Kafka clusters https://conduit.io/ @meroxadata
  • 39. Monitoring Streaming on the Meroxa Platform Observability @meroxadata
  • 40. Observability for Meroxa platform end users @meroxadata
  • 41. Observability for Meroxa platform end users: Connector State Running Failed Pending @meroxadata
  • 42. Observability for Meroxa platform end users: Connector state Controller Microservice: Connector Controller polls for connector status Running Failed Pending Controller Microservice Connector Custom Resource De fi nition Kafka Connect Read state custom resources Connector Controller Conduit server Conduit Controllers @meroxadata
  • 43. Observability for Meroxa platform end users: Connector Logs @meroxadata
  • 44. Observability for Meroxa platform end users @meroxadata
  • 45. Observability for Meroxa platform end users Simple aggregation and formatting @meroxadata
  • 46. How we monitor Kafka cluster and connectors MSK metrics tracked with Prometheus Data Plane instance Transfer of metrics across plane components with multiple Prometheus instances Aggregation of metrics in DataDog →Prometheus as cost-e ffi cient metric tool (open-source) @meroxadata
  • 48. How Meroxa leveraged Kafka to build a Stream Processing Application Platform Why Kafka? Scalable, robust and well-supported foundation for building modern data engineering software Creating scalable data infrastructure using Kafka Extensibility of our platform with Debezium, Kafka Connect and Conduit Internal and end-user observability with Kafka Connect and logging + metrics tooling @meroxadata