Leverage Kafka to build a stream processing platform
1. Jessy Jordan, Sta
ff
Software Engineer at Meroxa, @jayjayjpg
Building on top of Kafka
How Meroxa leveraged Kafka to build a Stream Processing
Application Platform
Apache, Apache Kafka, Kafka and the Kafka logo are trademarks of the Apache Software Foundation. The Apache Software Foundation has no a
ffi
liation with
and does not endorse the materials provided at this event.
@meroxadata
3. 🤔
…that allowed you to process real-time data e
ffi
ciently for
hundreds and thousands of users….
@meroxadata
4. 🤔
Including a great UX: wouldn’t you want to build on the
shoulders of giants, such as Apache Kafka?
@meroxadata
5. What is a Stream Processing Application Platform?
Allowing users to sync,
transform and persist
real-time data
Application Platform →
code-
fi
rst user interface
@meroxadata
6. What is a Stream Processing Application Platform?
@meroxadata
7. Building an Stream Processing Application Platform
Why use Kafka?
@meroxadata
8. Real-time stream processing essential for modern data engineering
Vast majority of Fortune 100
companies already relies on
event streaming to make use
of real-time data
Mission to build tools
equipped for real-time data
use cases
→ Kafka as de facto standard
for (event) streaming
https://kafka.apache.org/
@meroxadata
9. Building a Streaming Platform as a Service:
Why use Kafka?
Robust
Scalable
Easy to observe
Easy to extend
@meroxadata
10. Building a Streaming Platform as a Service:
Why use Kafka?
Robust
Scalable
Easy to observe
Easy to extend
Team expertise working
with Kafka
@meroxadata
11. Apache Kafka as integral part of the Meroxa platform
Technology Stack
@meroxadata
12. Control Plane Data Plane
REST API
Provisioner
Microservice
MSK
CRD
CRD
CRD CRD
Controller
Microservice
Kafka
Connect
@meroxadata
14. Control Plane Data Plane
REST API
Provisioner
Microservice
MSK
CRD
CRD
CRD CRD
Controller
Microservice
Kafka
Connect
@meroxadata
15. Provisioning
Reliance on AWS Cloud Services
Provisioning of Data Plane via AWS
Cloudformation, incl.
Managed Kubernetes Service (EKS)
Fully Managed Apache Kafka on AWS
(MSK)
AWS S3, ECR (Streaming App Build)
Data Plane provisioning to external end user
VPC possible
@meroxadata
16. Kafka cluster setup and scaling
Setup: 8 partitions, 3 replications, brokers
across several AZs
Horizontal + Vertical scaling of Kafka
Connect pods
Fully managed Kafka cluster
Time + Cost-e
ffi
ciency
Ability to focus on product development,
instead of data infrastructure operation
@meroxadata
17. Challenges of using Cloud-Hosted Kafka
Up- and down-scaling of storage limited
Still some operational burden for Kafka that
goes into maintenance in contrast to Cloud
Native
upgrades
scaling
monitoring
Limitation: air-gapped environments for
external deployment
@meroxadata
18. Deploy, destroy and modify Kafka connectors
Connector CRDs - platform
connectors based on Kafka Connect
Other CRDs also used for custom
connectors (Conduit)
Controllers create, delete, modify
connectors
Data Plane
MSK
CRD
CRD
CRD CRD
Controller
Microservice
Kafka
Connect
@meroxadata
29. Apache Kafka connector ecosystem
Apache Kafka as OSS promotes shared & faster development of compatible data integrations
120+ pre-built 🔌
Con
fl
uent
9 🔌
Debezium
Custom
(Kafka Connect)
@meroxadata
30. Extending the platform with community and custom connectors
Meroxa platform uses:
@meroxadata
31. Extending the platform with community and custom connectors
Meroxa platform uses:
Debezium connectors
@meroxadata
32. Extending the platform with community and custom connectors
Meroxa platform uses:
Debezium connectors
Custom Kafka Connect
connectors
@meroxadata
33. Apache Kafka connector ecosystem
Apache Kafka as OSS promotes shared & faster development of compatible data integrations
Con
fl
uent Debezium Kafka Connect
@meroxadata
34. Conduit connector ecosystem
Conduit as data integration OSS with its own connector ecosystem
Connector SDK (technically
language agnostic framework)
gRPC interface
OpenCDC Schema Format
@meroxadata
35. Conduit as alternative data connector framework
Enabling Kafka Connect - Conduit connector data pipelines
@meroxadata
36. Conduit as alternative data connector framework
Enabling Kafka Connect - Conduit connector data pipelines
@meroxadata
37. Extending the platform with community and custom connectors
Meroxa platform uses:
Debezium connectors
Custom Kafka Connect
connectors
Custom connectors
(Conduit)
@meroxadata
38. Extending the platform with community and custom connectors
Apache Kafka with open-source
connector ecosystem
Debezium providing open-source
platform for CDC, incl.
connectors
Conduit with its own connector
ecosystem
Including connectors integrating
back to end user’s Kafka clusters
https://conduit.io/
@meroxadata
42. Observability for Meroxa platform end users: Connector state
Controller Microservice: Connector
Controller polls for connector
status
Running
Failed
Pending
Controller
Microservice
Connector
Custom
Resource
De
fi
nition
Kafka Connect
Read state custom
resources
Connector Controller
Conduit server
Conduit Controllers
@meroxadata
46. How we monitor Kafka cluster and connectors
MSK metrics tracked with Prometheus Data
Plane instance
Transfer of metrics across plane
components with multiple Prometheus
instances
Aggregation of metrics in DataDog
→Prometheus as cost-e
ffi
cient metric tool
(open-source)
@meroxadata
48. How Meroxa leveraged Kafka to build a Stream Processing Application
Platform
Why Kafka?
Scalable, robust and well-supported foundation for building
modern data engineering software
Creating scalable data infrastructure using Kafka
Extensibility of our platform with Debezium, Kafka Connect and
Conduit
Internal and end-user observability with Kafka Connect and logging
+ metrics tooling
@meroxadata