4. Challenge #1: Centralized Teams
4
BIG DATA
PLATFORM
Ingest Process Serve
Requirements Requirements
Data Data
Data Producers
Domain Expertise,
Direct Influence
on Data Quality
Central Data Team
Data Engineering
Capability,
Responsible for
Data Quality
Data Consumers
Interest in Data Quality,
Data Application
Experience
FAIL TO
BOOTSTRAP
FAIL TO SCALE
SOURCES
FAIL TO SCALE
CONSUMERS
FAIL TO MATERIALIZE
DATA-DRIVEN VALUE
Data Architectures
&
Organization
Today
Risks of Creating
a Disconnect between
Data Owners and
Users
Failure Symptoms
for Creating
Data Driven Value
Centralized
Architecture
Technically
Decomposed
Hyper-Specialized
Silo Delivery
5. Challenge #2: Data Sharing & System Decoupling
Get away from Point-to-Point Data Sharing
6. Need: Operational Data Platform
Scalable and completely ecoupled architecture
Source
Source
Source
Source
Data Product
Data Product
Data Product
Data Product
Source
Source
Source
Source
Data Product
Data Product
Data Product
Data Product
6
for high-quality, self-service access to real-time data streams
Combine and enrich data
from anywhere to anywhere
for real-time data sharing
and greater reuse
7. Challenge #3: Bridging the
Operational & Analytical Worlds
DATA PIPELINES
OPERATIONAL SYSTEMS ANALYTICAL SYSTEMS
Domain 1
Operational
Database
Domain 2
Operational
Database
Domain 3
Operational
Database
Data Lake
Lake House
Data Mart
Data
Warehouse
ML/AI
Reports &
Dashboards
8. Analytical Repositories
The reality with today’s data integration strategy
A giant mess of monolithic point-to-point connections
with data fidelity and governance challenges
Operational Repository
Operational Repository
Operational Repository
Operational Repository
Operational Repository
Operational Repository
9. Challenge #4: “modern” data stack is built on a
legacy paradigm
Core processing
systems
External data
Unstructured data
Systems of Record
Browser mobile
Logs
Telemetry
SaaS apps
…
Infrastructure
Data Sources
Data Warehouse
Operational
Systems
Extract/Load
1
2
Transform
3
Reverse ETL
4
4
BI Tools
SaaS
Applications
10. Core processing
systems
External data
Unstructured data
Systems of Record
Browser mobile
Logs
Telemetry
SaaS apps
…
Infrastructure
Data Sources
Data Warehouse
Operational
Systems
Extract/Load
1
2
Transform
3
Reverse ELT
4
4
BI Tools
SaaS
Applications
Batch-based, low
fidelity stale data
is unsuitable for
real-time
operational and BI
use cases
Immature governance
and observability
creates data access
conflict between IT ops
and engineering teams
Infra-heavy data
processing
leads to scale and
performance
challenges and high
overall TCO
Over-reliance on
central data teams
with limited domain
knowledge become
innovation bottlenecks
Inflexible
monolithic design
results in multiple silo-ed
purpose-built pipelines,
increasing sprawl
Stale data, rigid engineering and poor lineage and governance
slows developer agility and innovation
Today’s data integration approaches create a chaotic
and unscalable data foundation
11. Data Mesh - Domain Data as a Product
Planning
Transformation
Visualization
E
T
L
E T
L
E T
L
Loyalty
Program
API
D
a
t
a
Products
API
D
a
t
a
Customer
Data
API
D
a
t
a
Risk
API
Payments
API
T
T
T
12. Data Mesh is an Architecture with Implementation
customer
analytics
Operational
Data Platform
Analytical
Data Platform
transaction
system
fraud
detection
payments
customer
onboarding
bank
account
Search listing
by term
asset
management
Identity Provider
Policy Provider
Data Catalog
Auditing
Self-Service Data Portal
Enterprise
Infrastructure
13. Cloud Data Systems
Data Stores
(I.e. PostgreSQL, MongoDB
Atlas, MySQL, Oracle DB)
Application Data
(i.e. Salesforce, ServiceNow,
Github, Zendesk)
Log Data &
Messaging Systems
(i.e. MQTT, Azure Service Bus,
Azure Event Hubs, Solace) ksqlDB
Confluent
Source
connectors
Optional: SMT
Sink connectors
Optional: SMT
Power your operational and analytical systems with
real-time streaming data
OLTP Systems
MongoDB
Atlas
Amazon
DynamoDB
Azure
Cosmos DB
Google
BigTable
Cassandra Redis
PostgreSQL
MySQL
OLAP Systems
Amazon
Redshift
Snowflake Google
BigQuery
Azure Synapse
Analytics
Databricks
Delta Lake
Amazon S3Google Cloud
Storage
Azure Blob
Storage
Stream
Governance
Pre-built
Connectors
14. Data Mesh is an Architecture with Implementation
...
Device
Logs ... ...
...
Data Stores Logs 3rd Party Apps Custom Apps / Microservices
Real-time
Inventory
Real-time Fraud
Detection
Real-time
Customer 360
Machine
Learning
Models
Real-time Data
Transformation ...
Event-Streaming Applications
Universal Event Pipeline
S3
SaaS
apps
App
Mainframes Snowflake Splunk
15. Confluent proved to be ready EVERYWHERE..
Private Cloud
Deploy on premises with
Confluent Platform
Public/Multi-Cloud
Leverage a fully managed
service with Confluent
Cloud
Hybrid Cloud
Build a persistent bridge
from datacenter to cloud
with Cluster Linking
17. Areas of Investment
Classify and
understand
the meaning
of the data in
Kafka.
Freedom of Choice
SELF-SERVICE
DATA CATALOG
DATA QUALITY
Enforce and
understand the
quality of the
data in Kafka.
Follow and
understand
the flow of
the data in
Kafka.
DATA LINEAGE DATA POLICIES
Event Streaming Platform
Enforce
policies
around who
can see and
do what.
Decentralized
data
management
in Kafka.
18. Confluent’s Areas of Investment
Search and discover
Metadata index | UI & API access
Classification
Tags | Generic key values
Central metadata repository
Technical metadata | Business metadata
Understand the meaning of
the data in Kafka
Freedom of Choice
Profiling
Data insights
Monitor quality
API | UI
Schema validation
Client side | Broker side
Point in time lineage
Lookup lineage by date
Inter cluster lineage
Flow of data across clusters
Intra cluster lineage
Flow of data inside a cluster
EVENTS LINEAGE
EVENTS CATALOG
EVENTS QUALITY
Enterprise license
Understand the quality of
the data in Kafka
Understand the flow of the
data in Kafka
Fully Managed Cloud Service Self-managed Software
Apache Kafka
Live
19. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
stream catalog
Increase collaboration and productivity
with self-service data discovery
Tag and classify data to increase the value of your catalog
Share what
you build
Find what
you need
20. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
stream
catalog
Tags
Key value pairs
TECHNICAL METADATA BUSINESS METADATA
INDEX
ENTITY TYPES
Owner
...
Name
Creation date
Integration with 3rd parties
stream catalog
26. Next-gen data lifecycle with Confluent
Topic 1
Topic 2
Topic 3
Topic 4
Topic 5
Topic 6
Topic A
Topic B
Topic C
Topic D
Data Product 1
Data Product 2
Connect
Continuously
stream data to
Confluent
Real-time Apps
SaaS Apps
Data Warehouse
Dashboards
Govern
Tag and
secure data
streams
Enrich
Process and
cleanse data
Build
Create ready to
share, read to use
data products
Share
Multicast to any
destination
Database
SaaS app
Custom producer
System of Record
27. Overall Architecture
Kafka
search lineage manage
Integration
RBAC/
ABAC /
PBAC
quality policies
Events Quality
(Broker interceptors, content validation rules)
on-prem hosted
Technical metadata
Business metadata
Lineage metadata
REST API
GraphQL
Events
Portal
Events Catalog
Schema Registry
Connectors Client Applications
Java, .NET, Python, ..
REST Proxy
Analytics
Data storage
Data Catalog
Non-streaming
data sources
DB MQ Host
28. Confluent acts as the Central Nervous System to
connect all of your apps & data systems
Databases
Data Warehouses
AWS, Azure, GCP
Legacy Infra / Mainframes
Custom Apps
SaaS Apps
Legacy Apps
AWS, Azure, GCP
Databases
Data Warehouses
Legacy Infra / Mainframes
Custom Apps
SaaS Apps
Legacy Apps
31. SWITZERLAND
Backbine for a scalable busines
Networks & Data Mesh
Modern Payments in Cloud
Central Data Platform
Convergence of BI & CX
IoT & Microservices
Governed Data Mesh