SlideShare una empresa de Scribd logo
1 de 43
Recruiting SolutionsRecruiting SolutionsRecruiting Solutions
Databus
LinkedIn’s Change Data Capture Pipeline
Databus Team @ LinkedIn
Sunil Nagaraj
http://www.linkedin.com/in/sunilnagaraj
Eventbrite
May 07 2013
Talking Points
 Motivation and Use-Cases
 Design Decisions
 Architecture
 Sample Code
 Performance
 Databus at LinkedIn
 Review
The Consequence of Specialization in Data Systems
Data Consistency is critical !!!
Data Flow is essential
Extract changes from
database commit log
Tough but possible
Consistent!!!
Application code dual
writes to database and
pub-sub system
Easy on the surface
Consistent?
Two Ways
Change Extract: Databus
5
Primary
Data Store
Data Change Events
Standar
dization
Standar
dization
Standar
dization
Standar
dization
Standar
dization
Search
Index
Standar
dization
Standar
dization
Graph
Index
Standar
dization
Standar
dization
Read
Replicas
Updates
Databus
Example: External Indexes
 Description
– Full-text and faceted search
over profile data
 Requirements
– Timeline consistency
– Guaranteed delivery
– Low latency
– User-space visibility
6
Members
Update
skills
Recruiters
Search
Results
Change events
linkedin.com recruiter.linkedin.com
People
Search Index
Databus
A brief history of Databus
 2006-2010 : Databus became an established and vital
piece of infrastructure for consistent data flow from
Oracle
 2011 : Databus (V2) addressed scalability and operability
issues
 2012 : Databus supported change capture from Espresso
 2013 : Open Source Databus
– https://github.com/linkedin/databus
Databus Eco-system: Participants
Primary
Data Store
Source Databus
Consumer
Application
Change
Data
Capture
Change Event
Stream
events
events
change
data
• Support
transactions
• Extract changed
data of committed
transactions
• Transform to ‘user-
space’ events
• Preserve atomicity
• Receive change
events quickly
• Preserve
consistency with
source
Databus Eco-System : Realities
Databases
Source Databus
Fast
Consumer
Applications
Change
Data
Capture
Change Event
Stream
Slow
Consumer
New
Consumer
Every
change
Changes
since last
week
Changes
since last 5
seconds
Schema
s evolve
• Source cannot be burdened by ‘long look back’
extracts
• Applications cannot be forced to move to
latest version of schema at once
change
data
events
Key Design Decisions : Semantics
 Change Data Capture uses logical clocks attached to the
source (SCN)
– Change data stream is ordered by SCN
– Simplifies data portability , change stream is f(SourceState,SCN)
 Applications are idempotent
– At least once delivery
– Track progress reliably (SCN)
– Timeline consistency
10
Key Design Decisions : Systems
 Isolate fast consumers from slow consumers
– Workload separation between online(recent), catch-up (old),
bootstrap (all)
 Isolate sources from consumers
– Schema changes
– Physical layout changes
– Speed mismatch
 Schema-awareness
– Compatibility checks
– Filtering at change stream
11
The Components of Databus
12
DB
Change
Capture
Event Buffer
(In Memory)
change data
Consumer
Relay
Databus
Client
Application
online changes
Bootstrap
New
ApplicationConsistent
snapshot
Log Store
Snapshot
Store
online changes
Bootstrap
Consumer
older changes
Slow
Application
Metadata
Change Data Capture
 Contains logic to extract
changes from source from
specified SCN
 Implementations
– Oracle
 Trigger-based
 Commit ordering
 Special instrumentation required
– MySQL
 Custom-storage-engine based
EventProducer
start(SCN ) //capture changes from
specified SCN
SCN getSCN() //return latest SCN
Change Data Capture
SC
N
Database Schemas
MySQL : Change Data Capture
Databus 14
MySQL
Master
MySQL
Slave
MySql
replication
TCP
Channel
• MySQL Replication takes care of
• bin-log parsing
• Protocol between master and slave
• Handling restarts
• Relay
• Provides a TCP Protocol interface to push events
• Controls and Manages MySql Slave
Relay
Publish – Subscribe API
DB
Change
Data
Capture
Event Buffer
(In Memory)
publish
extract
(src,SCN)
Consumer
subscribe
(src,SCN)
EventBuffer
startEvents() //e.g. new txn
DbusEvent(enc(schema,changeData),src,pk)
appendEvent(DbusEvent, ...)
endEvents(SCN) //e.g. end of txn; commit
rollbackEvents() //abort this window
Consumer
register(source, ‘Callback’)
onStartConsumption() //once
onStartDataEventSequence(SCN)
onStartSource(src,Schema)
onDataEvent(DbusEvent e,…)
onEndSource(src,Schema)
onEndDataEventSequence(SCN)
onRollback(SCN)
onStopConsumption() //once
The Databus Change Event Stream
Event Buffer
(In Memory)
Relay
Bootstrap
Log Store
Snapshot
Store
online changes
• Provide APIs to obtain change events
• Query API specifies logical clock(SCN) and
source
• ‘Get change events greater than SCN’
• Filtering at source possible
• MOD, RANGE filter functions
applied to primary key of the event
• Batching/Chunking to guarantee
progress
• Does not contain state of consumers
• Contains references to metadata and
schemas
• Implementation
• HTTP server
• Persistent connection to clients
• REST API
Change Event Stream
Meta-data Management
 Event definition, serialization and transport
– Avro
 Oracle, MySQL
– Table schema generates Avro definition
 Schema evolution
– Only backwards-compatible changes allowed
 Isolation of applications from changes in source schema
 Many versions of a source used by applications , but one
version(latest) of the change stream exists
The Databus Relay
Change
Capture
Event Buffer
(In Memory)
Relay
Database Schemas
Src
Meta-
data
• Encapsulates change capture logic and
change event stream
• Source aware, schema aware
• Multi-tenant: Multiple Event Buffers
representing change events of different
databases
• Optimizations
• Index on SCN exists to quickly
locate physical offset in EventBuffer
• Locally stores SCN per source for
efficient restarts
• Large Event Buffers possible (> 2G)
SCN
store
API
Scaling Databus Relay
DB
Relay Relay Relay
• Peer relays, independent
• Increased load on the source
DB with each additional relay
instance
DB
Relay
Leader
Relay
(Follower
)
• Relays in leader-follower cluster
• Only the leader reads from DB ,
followers from leader
• Leadership assigned dynamically
• Small period of stream
unavailability during leadership
transfer
Relay
(Follower
)
The Bootstrap Service
 Bridges the continuum between stream and
batch systems
 Catch-all for slow / new consumers
 Isolate source instance from large scans
 Snapshot store has to be seeded once
 Optimizations
– Periodic merge
– Filtering pushed down to store
– Catch-up versus full bootstrap
 Guaranteed progress for consumers via
chunking
 Multi-tenant - can contain data from many
different databases
 Implementations
– Database (MySQL)
– Raw Files
Relay
Bootstra
p
Log Store
Snapshot
Store
online changes
Bootstrap
Consumer
seeding
Database
The Databus Client Library
 Glue between Databus Change
Stream and business logic in the
Consumer
 Switches between relay and bootstrap
as needed
 Optimizations
– Change events uses batch write
API without deserialization
 Periodically persists SCN for lossless
recovery
 Built-in support for parallelism
– Consumers need to be thread-safe
– Useful for scaling large batch processing
(bootstrap)
EventBuffer
Databus Change
Stream
Change
Stream Client
SCN
store
API
Dispatcher
Stream
Consumer
Bootstrap
Consumer
iterate
write
callback
read
Databus Client Library
Databus Applications
Consumer
S1
DatabusClient
Application
Consumer
S2
Consumer
Sn
S1
S2
Sn
Change
Streams
• Applications can process multiple
independent change streams
• Failure of one won’t affect
others
• Different logic and configuration
settings for bootstrap and online
consumption possible
• Processing can be tied to a
particular version of schema
• Able to override client library
persisted SCN
Client
Application
(i=1..k)
Client
Application
(k+1..N)
Change Stream
i= pk MOD N
(i=0..k-1)
(i=k..N-1)
• Databus Clients consume partitioned streams
• Partitioning strategy: Range or Hash
• Partitioning function applied at source
• Number of partitions (N) , and list of partitions (i) specified
statically in configuration
• Not easy to add/remove nodes
• Needs configuration change on all nodes
Client nodes uniform:
can process any
partition(s)
Clients distribute
processing load
Scaling Applications - I
Client
Application
N/m partitions
Application
N/m
partitions
Databus Stream
i= pk mod N
Dynamically
allocated
partitions
N partitions distributed
evenly amongst ‘m’
nodes
SCN written to central
location
• Databus Clients consume partitioned streams
• Partitioning strategy: MOD
• Partition function applied at source
• Number of partitions (N) , and cluster name specified
statically in configuration
• Easy to add or remove nodes
• Dynamic redistribution of partitions
• Fault tolerance for client nodes
Scaling Applications - II
Databus: Current Implementation
 OS - Linux, written in Java , runs Java 6
 All components have http interfaces
 Databus Client: Java
– Other language bindings possible
– All communication with change stream via http
 Libraries
– Netty , for http client-servers
– Avro , for serialization of change events
– Helix , for cluster awareness
Sample Code: Simple Application
Sample Code - Consumer
Databus Performance : Relay
 Relay
– Saturates network with low CPU utilization
 CPU utilization increases with more clients
 Increased poll interval (increase consumer latency ) reduces CPU
utilization
– Scales to 100’s of consumers (client instances)
Performance: Relay Throughput
Databus 295/13/2013
Databus Performance : Consumer
 Consumer
– Latency primarily governed by ‘poll interval’
– Low overhead of library in event fetch
 Spike in latency due to network saturation at relay
 Scaling number of consumers
 Use partitioned consumption (filtering at relay )
– Reduces network utilization , but some increase in latency due to
filtering
 Increase ‘poll interval’ , tolerate higher latencies
Performance: Consumer Throughput
Databus 315/13/2013
Performance: End-End Latency
5/13/2013 Databus 32
Databus Bootstrap :Performance
 Bootstrap
– Should we serve from ‘catchup store’ or ‘snapshot store’
– Depends: Traffic patterns in the spectrum ‘all updates’ , ‘all
inserts’
– Tune service depending on fraction of update and inserts
 Favour snapshot based serving for update heavy traffic
Bootstrap Performance: Snapshot vs Catch-up
Databus 345/13/2013
M
Oracle Change
Event
Stream
M
Espresso
Change Event
Event Stream
Databus
Service
• Databus Change Stream is a
managed service
• Applications discover/lookup
coordinates of sources
• Multi-tenant , chained relays
• Many sources can be
bootstrapped from SCN 0
(beginning of time)
• Automated change stream
provisioning is a work in
progress
Databus at LinkedIn
Databus at LinkedIn : Monitoring
 Available out of the box as JMX Mbean
 Metrics for health
– lag between update time at DB and the time at which it was
received by application
– time of last contact to change event stream and source
 Metrics for capacity planning
– Event rate/ size
– Request rate
– Threads/ conns
Databus at LinkedIn: The Good
 Source isolation: Bootstrap benefits
– Typically, data extracted from sources just once (seeding)
– Bootstrap service used during launch of new applications
– Primary data store not subject to unpredictable high loads due to
lagging applications
 Common Data Format
– Avro offers ease-of-use , flexibility and performance
improvements (larger retention periods of change events in
Relay)
 Partitioned Stream Consumption
– Applications horizontally scaled to 100’s of instances
Databus at LinkedIn: Operational Niggles
 Oracle Change Capture Performance Bottlenecks
– Complex joins
– BLOBS and CLOBS
– High update rate driven contention on trigger table
 Bootstrap: Snapshot store seeding
– Consistent snapshot extraction from large sources
 Semi-automated change stream provisioning
Quick Review
 Specialization in Data Systems
– CDC pipeline is a first class infrastructure citizen up there with
stores and indexes
 Source Independent
– Change capture logic can be plugged in
 Use of SCN – an external clock attached to source
– Makes change stream more ‘portable’
– Easy for applications to reason about consistency with source
 Pub-Sub API support atomicity semantics of transactions
 Bootstrap Service
– Isolates the source from abusive scans
– Serves both streaming and batch use-cases
39
Questions
40
Additional Slides
The Timeline Consistent Data Flow problem
Databus: First attempt (2007)
Issues
 Source database pressure
caused by slow consumers
 Brittle serialization

Más contenido relacionado

La actualidad más candente

Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with PrometheusShiao-An Yuan
 
Event-driven microservices
Event-driven microservicesEvent-driven microservices
Event-driven microservicesAndrew Schofield
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaShiao-An Yuan
 
Change Data Capture
Change Data CaptureChange Data Capture
Change Data Capturearnoud_otte
 
Data Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryData Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryMark Kromer
 
What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?FlyData Inc.
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...confluent
 
DevOps for Databricks
DevOps for DatabricksDevOps for Databricks
DevOps for DatabricksDatabricks
 
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...Databricks
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotXiang Fu
 
Power BI Architecture
Power BI ArchitecturePower BI Architecture
Power BI ArchitectureArthur Graus
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDatabricks
 
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012Shirshanka Das
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebookAniket Mokashi
 
Capture the Streams of Database Changes
Capture the Streams of Database ChangesCapture the Streams of Database Changes
Capture the Streams of Database Changesconfluent
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
 
KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...
KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...
KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...confluent
 
Frame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine LearningFrame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine LearningDavid Stein
 

La actualidad más candente (20)

Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
 
Event-driven microservices
Event-driven microservicesEvent-driven microservices
Event-driven microservices
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Change Data Capture
Change Data CaptureChange Data Capture
Change Data Capture
 
Data Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryData Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data Factory
 
What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
 
DevOps for Databricks
DevOps for DatabricksDevOps for Databricks
DevOps for Databricks
 
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Power BI Architecture
Power BI ArchitecturePower BI Architecture
Power BI Architecture
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
 
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
 
FLiP Into Trino
FLiP Into TrinoFLiP Into Trino
FLiP Into Trino
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebook
 
Capture the Streams of Database Changes
Capture the Streams of Database ChangesCapture the Streams of Database Changes
Capture the Streams of Database Changes
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...
KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...
KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...
 
Frame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine LearningFrame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine Learning
 

Similar a Databus - LinkedIn's Change Data Capture Pipeline

Databus - Abhishek Bhargava & Maheswaran Veluchamy - DevOps Bangalore Meetup...
Databus - Abhishek Bhargava &  Maheswaran Veluchamy - DevOps Bangalore Meetup...Databus - Abhishek Bhargava &  Maheswaran Veluchamy - DevOps Bangalore Meetup...
Databus - Abhishek Bhargava & Maheswaran Veluchamy - DevOps Bangalore Meetup...DevOpsBangalore
 
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...Creating a Centralized Consumer Profile Management Service with WebSphere Dat...
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...Prolifics
 
Caching for Microservices Architectures: Session I
Caching for Microservices Architectures: Session ICaching for Microservices Architectures: Session I
Caching for Microservices Architectures: Session IVMware Tanzu
 
Building cloud native data microservice
Building cloud native data microserviceBuilding cloud native data microservice
Building cloud native data microserviceNilanjan Roy
 
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark Summit
 
Full lifecycle of a microservice
Full lifecycle of a microserviceFull lifecycle of a microservice
Full lifecycle of a microserviceLuigi Bennardis
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationMongoDB
 
Caching for Microservices Architectures: Session II - Caching Patterns
Caching for Microservices Architectures: Session II - Caching PatternsCaching for Microservices Architectures: Session II - Caching Patterns
Caching for Microservices Architectures: Session II - Caching PatternsVMware Tanzu
 
Tordatasci meetup-precima-retail-analytics-201901
Tordatasci meetup-precima-retail-analytics-201901Tordatasci meetup-precima-retail-analytics-201901
Tordatasci meetup-precima-retail-analytics-201901WeCloudData
 
Cosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics WorkshopCosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics WorkshopDatabricks
 
The Essential Guide for Automating CMDB population and maintenance
The Essential Guide for Automating CMDB population and maintenanceThe Essential Guide for Automating CMDB population and maintenance
The Essential Guide for Automating CMDB population and maintenanceStefan Bergstein
 
Case Study For Replication For PCMS
Case Study For Replication For PCMSCase Study For Replication For PCMS
Case Study For Replication For PCMSShahzad
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureLuan Moreno Medeiros Maciel
 
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...confluent
 
Event sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event StoreEvent sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event StoreShivji Kumar Jha
 
All Aboard the Databus
All Aboard the DatabusAll Aboard the Databus
All Aboard the DatabusAmy W. Tang
 
MineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White PaperMineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White PaperDerek Diamond
 
Oracle database 12c introduction- Satyendra Pasalapudi
Oracle database 12c introduction- Satyendra PasalapudiOracle database 12c introduction- Satyendra Pasalapudi
Oracle database 12c introduction- Satyendra Pasalapudipasalapudi123
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...AboutYouGmbH
 

Similar a Databus - LinkedIn's Change Data Capture Pipeline (20)

Databus - Abhishek Bhargava & Maheswaran Veluchamy - DevOps Bangalore Meetup...
Databus - Abhishek Bhargava &  Maheswaran Veluchamy - DevOps Bangalore Meetup...Databus - Abhishek Bhargava &  Maheswaran Veluchamy - DevOps Bangalore Meetup...
Databus - Abhishek Bhargava & Maheswaran Veluchamy - DevOps Bangalore Meetup...
 
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...Creating a Centralized Consumer Profile Management Service with WebSphere Dat...
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...
 
Caching for Microservices Architectures: Session I
Caching for Microservices Architectures: Session ICaching for Microservices Architectures: Session I
Caching for Microservices Architectures: Session I
 
Building cloud native data microservice
Building cloud native data microserviceBuilding cloud native data microservice
Building cloud native data microservice
 
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with Spark
 
Full lifecycle of a microservice
Full lifecycle of a microserviceFull lifecycle of a microservice
Full lifecycle of a microservice
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
 
Caching for Microservices Architectures: Session II - Caching Patterns
Caching for Microservices Architectures: Session II - Caching PatternsCaching for Microservices Architectures: Session II - Caching Patterns
Caching for Microservices Architectures: Session II - Caching Patterns
 
Tordatasci meetup-precima-retail-analytics-201901
Tordatasci meetup-precima-retail-analytics-201901Tordatasci meetup-precima-retail-analytics-201901
Tordatasci meetup-precima-retail-analytics-201901
 
Cosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics WorkshopCosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics Workshop
 
The Essential Guide for Automating CMDB population and maintenance
The Essential Guide for Automating CMDB population and maintenanceThe Essential Guide for Automating CMDB population and maintenance
The Essential Guide for Automating CMDB population and maintenance
 
Case Study For Replication For PCMS
Case Study For Replication For PCMSCase Study For Replication For PCMS
Case Study For Replication For PCMS
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
S18 das
S18 dasS18 das
S18 das
 
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
 
Event sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event StoreEvent sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event Store
 
All Aboard the Databus
All Aboard the DatabusAll Aboard the Databus
All Aboard the Databus
 
MineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White PaperMineDB Mineral Resource Evaluation White Paper
MineDB Mineral Resource Evaluation White Paper
 
Oracle database 12c introduction- Satyendra Pasalapudi
Oracle database 12c introduction- Satyendra PasalapudiOracle database 12c introduction- Satyendra Pasalapudi
Oracle database 12c introduction- Satyendra Pasalapudi
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
 

Databus - LinkedIn's Change Data Capture Pipeline

  • 1. Recruiting SolutionsRecruiting SolutionsRecruiting Solutions Databus LinkedIn’s Change Data Capture Pipeline Databus Team @ LinkedIn Sunil Nagaraj http://www.linkedin.com/in/sunilnagaraj Eventbrite May 07 2013
  • 2. Talking Points  Motivation and Use-Cases  Design Decisions  Architecture  Sample Code  Performance  Databus at LinkedIn  Review
  • 3. The Consequence of Specialization in Data Systems Data Consistency is critical !!! Data Flow is essential
  • 4. Extract changes from database commit log Tough but possible Consistent!!! Application code dual writes to database and pub-sub system Easy on the surface Consistent? Two Ways
  • 5. Change Extract: Databus 5 Primary Data Store Data Change Events Standar dization Standar dization Standar dization Standar dization Standar dization Search Index Standar dization Standar dization Graph Index Standar dization Standar dization Read Replicas Updates Databus
  • 6. Example: External Indexes  Description – Full-text and faceted search over profile data  Requirements – Timeline consistency – Guaranteed delivery – Low latency – User-space visibility 6 Members Update skills Recruiters Search Results Change events linkedin.com recruiter.linkedin.com People Search Index Databus
  • 7. A brief history of Databus  2006-2010 : Databus became an established and vital piece of infrastructure for consistent data flow from Oracle  2011 : Databus (V2) addressed scalability and operability issues  2012 : Databus supported change capture from Espresso  2013 : Open Source Databus – https://github.com/linkedin/databus
  • 8. Databus Eco-system: Participants Primary Data Store Source Databus Consumer Application Change Data Capture Change Event Stream events events change data • Support transactions • Extract changed data of committed transactions • Transform to ‘user- space’ events • Preserve atomicity • Receive change events quickly • Preserve consistency with source
  • 9. Databus Eco-System : Realities Databases Source Databus Fast Consumer Applications Change Data Capture Change Event Stream Slow Consumer New Consumer Every change Changes since last week Changes since last 5 seconds Schema s evolve • Source cannot be burdened by ‘long look back’ extracts • Applications cannot be forced to move to latest version of schema at once change data events
  • 10. Key Design Decisions : Semantics  Change Data Capture uses logical clocks attached to the source (SCN) – Change data stream is ordered by SCN – Simplifies data portability , change stream is f(SourceState,SCN)  Applications are idempotent – At least once delivery – Track progress reliably (SCN) – Timeline consistency 10
  • 11. Key Design Decisions : Systems  Isolate fast consumers from slow consumers – Workload separation between online(recent), catch-up (old), bootstrap (all)  Isolate sources from consumers – Schema changes – Physical layout changes – Speed mismatch  Schema-awareness – Compatibility checks – Filtering at change stream 11
  • 12. The Components of Databus 12 DB Change Capture Event Buffer (In Memory) change data Consumer Relay Databus Client Application online changes Bootstrap New ApplicationConsistent snapshot Log Store Snapshot Store online changes Bootstrap Consumer older changes Slow Application Metadata
  • 13. Change Data Capture  Contains logic to extract changes from source from specified SCN  Implementations – Oracle  Trigger-based  Commit ordering  Special instrumentation required – MySQL  Custom-storage-engine based EventProducer start(SCN ) //capture changes from specified SCN SCN getSCN() //return latest SCN Change Data Capture SC N Database Schemas
  • 14. MySQL : Change Data Capture Databus 14 MySQL Master MySQL Slave MySql replication TCP Channel • MySQL Replication takes care of • bin-log parsing • Protocol between master and slave • Handling restarts • Relay • Provides a TCP Protocol interface to push events • Controls and Manages MySql Slave Relay
  • 15. Publish – Subscribe API DB Change Data Capture Event Buffer (In Memory) publish extract (src,SCN) Consumer subscribe (src,SCN) EventBuffer startEvents() //e.g. new txn DbusEvent(enc(schema,changeData),src,pk) appendEvent(DbusEvent, ...) endEvents(SCN) //e.g. end of txn; commit rollbackEvents() //abort this window Consumer register(source, ‘Callback’) onStartConsumption() //once onStartDataEventSequence(SCN) onStartSource(src,Schema) onDataEvent(DbusEvent e,…) onEndSource(src,Schema) onEndDataEventSequence(SCN) onRollback(SCN) onStopConsumption() //once
  • 16. The Databus Change Event Stream Event Buffer (In Memory) Relay Bootstrap Log Store Snapshot Store online changes • Provide APIs to obtain change events • Query API specifies logical clock(SCN) and source • ‘Get change events greater than SCN’ • Filtering at source possible • MOD, RANGE filter functions applied to primary key of the event • Batching/Chunking to guarantee progress • Does not contain state of consumers • Contains references to metadata and schemas • Implementation • HTTP server • Persistent connection to clients • REST API Change Event Stream
  • 17. Meta-data Management  Event definition, serialization and transport – Avro  Oracle, MySQL – Table schema generates Avro definition  Schema evolution – Only backwards-compatible changes allowed  Isolation of applications from changes in source schema  Many versions of a source used by applications , but one version(latest) of the change stream exists
  • 18. The Databus Relay Change Capture Event Buffer (In Memory) Relay Database Schemas Src Meta- data • Encapsulates change capture logic and change event stream • Source aware, schema aware • Multi-tenant: Multiple Event Buffers representing change events of different databases • Optimizations • Index on SCN exists to quickly locate physical offset in EventBuffer • Locally stores SCN per source for efficient restarts • Large Event Buffers possible (> 2G) SCN store API
  • 19. Scaling Databus Relay DB Relay Relay Relay • Peer relays, independent • Increased load on the source DB with each additional relay instance DB Relay Leader Relay (Follower ) • Relays in leader-follower cluster • Only the leader reads from DB , followers from leader • Leadership assigned dynamically • Small period of stream unavailability during leadership transfer Relay (Follower )
  • 20. The Bootstrap Service  Bridges the continuum between stream and batch systems  Catch-all for slow / new consumers  Isolate source instance from large scans  Snapshot store has to be seeded once  Optimizations – Periodic merge – Filtering pushed down to store – Catch-up versus full bootstrap  Guaranteed progress for consumers via chunking  Multi-tenant - can contain data from many different databases  Implementations – Database (MySQL) – Raw Files Relay Bootstra p Log Store Snapshot Store online changes Bootstrap Consumer seeding Database
  • 21. The Databus Client Library  Glue between Databus Change Stream and business logic in the Consumer  Switches between relay and bootstrap as needed  Optimizations – Change events uses batch write API without deserialization  Periodically persists SCN for lossless recovery  Built-in support for parallelism – Consumers need to be thread-safe – Useful for scaling large batch processing (bootstrap) EventBuffer Databus Change Stream Change Stream Client SCN store API Dispatcher Stream Consumer Bootstrap Consumer iterate write callback read Databus Client Library
  • 22. Databus Applications Consumer S1 DatabusClient Application Consumer S2 Consumer Sn S1 S2 Sn Change Streams • Applications can process multiple independent change streams • Failure of one won’t affect others • Different logic and configuration settings for bootstrap and online consumption possible • Processing can be tied to a particular version of schema • Able to override client library persisted SCN
  • 23. Client Application (i=1..k) Client Application (k+1..N) Change Stream i= pk MOD N (i=0..k-1) (i=k..N-1) • Databus Clients consume partitioned streams • Partitioning strategy: Range or Hash • Partitioning function applied at source • Number of partitions (N) , and list of partitions (i) specified statically in configuration • Not easy to add/remove nodes • Needs configuration change on all nodes Client nodes uniform: can process any partition(s) Clients distribute processing load Scaling Applications - I
  • 24. Client Application N/m partitions Application N/m partitions Databus Stream i= pk mod N Dynamically allocated partitions N partitions distributed evenly amongst ‘m’ nodes SCN written to central location • Databus Clients consume partitioned streams • Partitioning strategy: MOD • Partition function applied at source • Number of partitions (N) , and cluster name specified statically in configuration • Easy to add or remove nodes • Dynamic redistribution of partitions • Fault tolerance for client nodes Scaling Applications - II
  • 25. Databus: Current Implementation  OS - Linux, written in Java , runs Java 6  All components have http interfaces  Databus Client: Java – Other language bindings possible – All communication with change stream via http  Libraries – Netty , for http client-servers – Avro , for serialization of change events – Helix , for cluster awareness
  • 26. Sample Code: Simple Application
  • 27. Sample Code - Consumer
  • 28. Databus Performance : Relay  Relay – Saturates network with low CPU utilization  CPU utilization increases with more clients  Increased poll interval (increase consumer latency ) reduces CPU utilization – Scales to 100’s of consumers (client instances)
  • 30. Databus Performance : Consumer  Consumer – Latency primarily governed by ‘poll interval’ – Low overhead of library in event fetch  Spike in latency due to network saturation at relay  Scaling number of consumers  Use partitioned consumption (filtering at relay ) – Reduces network utilization , but some increase in latency due to filtering  Increase ‘poll interval’ , tolerate higher latencies
  • 33. Databus Bootstrap :Performance  Bootstrap – Should we serve from ‘catchup store’ or ‘snapshot store’ – Depends: Traffic patterns in the spectrum ‘all updates’ , ‘all inserts’ – Tune service depending on fraction of update and inserts  Favour snapshot based serving for update heavy traffic
  • 34. Bootstrap Performance: Snapshot vs Catch-up Databus 345/13/2013
  • 35. M Oracle Change Event Stream M Espresso Change Event Event Stream Databus Service • Databus Change Stream is a managed service • Applications discover/lookup coordinates of sources • Multi-tenant , chained relays • Many sources can be bootstrapped from SCN 0 (beginning of time) • Automated change stream provisioning is a work in progress Databus at LinkedIn
  • 36. Databus at LinkedIn : Monitoring  Available out of the box as JMX Mbean  Metrics for health – lag between update time at DB and the time at which it was received by application – time of last contact to change event stream and source  Metrics for capacity planning – Event rate/ size – Request rate – Threads/ conns
  • 37. Databus at LinkedIn: The Good  Source isolation: Bootstrap benefits – Typically, data extracted from sources just once (seeding) – Bootstrap service used during launch of new applications – Primary data store not subject to unpredictable high loads due to lagging applications  Common Data Format – Avro offers ease-of-use , flexibility and performance improvements (larger retention periods of change events in Relay)  Partitioned Stream Consumption – Applications horizontally scaled to 100’s of instances
  • 38. Databus at LinkedIn: Operational Niggles  Oracle Change Capture Performance Bottlenecks – Complex joins – BLOBS and CLOBS – High update rate driven contention on trigger table  Bootstrap: Snapshot store seeding – Consistent snapshot extraction from large sources  Semi-automated change stream provisioning
  • 39. Quick Review  Specialization in Data Systems – CDC pipeline is a first class infrastructure citizen up there with stores and indexes  Source Independent – Change capture logic can be plugged in  Use of SCN – an external clock attached to source – Makes change stream more ‘portable’ – Easy for applications to reason about consistency with source  Pub-Sub API support atomicity semantics of transactions  Bootstrap Service – Isolates the source from abusive scans – Serves both streaming and batch use-cases 39
  • 42. The Timeline Consistent Data Flow problem
  • 43. Databus: First attempt (2007) Issues  Source database pressure caused by slow consumers  Brittle serialization

Notas del editor

  1. Large scale storage systems, by nature, are distributed. Data is stored in multiple machines. Also, it is often the case that it is the same data (I mean, same content) that is stored in different places to support different access patterns, efficient retrieval, or for quick look up of derived data (as opposed to computing them during a look-up). In order to have such distributed systems work together to provide a service, two things are needed: CLICKData flow between these systems: When data is updated in one side on one system, it should be reflected in the other parts that store the same content. CLICKData consistency: The different parts of the system must converge at some point in time.So, we need a change capture system that supports such a distributed system.
  2. Basically there are two ways changes can be captured. Applications can dual-write data into the database and the change streamWe capture the data change list from the commit logs that most databases have.CLICKDual writes appear really easy on the surface. But then when we start considering the transient failure scenarios, achieving consistency gets harder, sometimes impossible. You may need to get into 2P commits, etc. compromising on performance, or availability.CLICKOn the other hand, extracting changes from the database is almost like post-processing. Minimal or no performance penalty, application is unaware of the existence of a change capture system, and consistency is not an issue as long as you see the commit logs in entirety. No such thing as a free lunch, of course. Commit logs formats are proprietary, and extracting from them can be tough.We chose to take this approach of change extract.
  3. Here are some of the use cases for Databus. In this picture, we have one source of truth for data – the primary database. Changes to the data are observed (or, consumed) by consumers, which may the turn around and update derived data serving systems. Or, data may be extracted into Hadoop (for example), to be re-loaded into some derived systems. At Linked in, we our primary database is in Oracle . For example, we have a database that holds member information. When rows in these databases are altered, the change events need to be propagated to a search index. The search index is used to serve queries from recruiters looking for appropriate candidates. Similarly there are other consumers of different databases, each having their own business logic to build derived data out of the primary database.Databus is used to capture these changes and provide them as change events to consumers.So, what would be the requirements of such a system?
  4. … Let’s look at a brief biography of Databus
  5. … Now for a close look at Databus
  6. Use of ‘change’ in ‘change data’ more of a noun – rather than a verb – ‘data that has changed’ rather than alrer the data. – Industry std term.Change capture logic (CDC ) to extract changes – in a consistent way (preserving the consistency, ordering -> e.g. ways to extract order of commits, I D semantics)Publisher and Subscriber APIs that lets the CDC transform he extracted changes and publish those events with atomicity guarantees of the source... Applications preserved consistency when they applied the changes they received in a timely manner… and there were realities
  7. Different type of applicationsSchemas evolving at sourceButSource cannot be burdened (a typical problem with V1 )Applications cannot be forced to move to latest version ( resulted in proliferation of different versions of change streams of same source)
  8. External clock attached to source. Ordering defined by the source- e.g. commit ordering in Oracle -> increasing SCN – in mysqlbinlog increasing txns could be the scnsNo additional source of truth, no additional point of failureAbility to recreate event stream given SCN and sourceFor applications ; the ordering of events is same as that seen by the source – so eventually the source and apps will converge , SCN is used to track progress on the app, Apps can reason about consistency with source- as external clock SCN is used , SCN is logical – not tied to any particular change stream node, Apps need to be idempotent – as they can see a change more than once .Apps can reason about consistency – derived stores can reason amongst each other as they have SCN visibility – a concept that is useful to compare consistency across applicationsTimeline consistency: at least once guarantee – order of change events same as source db; no updates missed. SCN; all apps listening to the change stream see the same order of change events
  9. Pul model - > as opposed to push – where producers keep track of their consumer progress- and call clients as long as they are available, pull model assumes the state required to servr a request lies with the consumer. Restartability is easier as state can be computed from source,SCN on any machine. This Is true at both change event stream and the consumer.Separation of concerns – between use cases of ‘online consumption’ – recent changes and ‘catchup/ bootstrap ‘ case – where older changes are required – different scalability properties IsolIsolate Sources and consumers – source can move , schemas can change, and of couse, producer speed and consumption speeds can vastly differWe are not just transport – we support meta data – such as schemas. We ensure that consumers have a good experience – while the change stream also becomes more managable – ultimately helping provisioning and consumer robustness.Also gives an option of adding more filtering options at the change stream.
  10. Point: Change Capture is within the relay: each relay is self-sufficient, i.e. since eventBufferState = fn(source,SCN) – it has change capture logic to pull in the changes – if change capture were outside, then the change capture logic ,fan out has to take care of replication or write to a leader-follower relay cluster. EventBuffer wraps around if it runs out of memory.Point: Client Library: fetches changes from the databus stream (which is now Relay and Bootstrap Service) - . Point: Workload separation : cases – recent change, to older changes, snapshots – cannot rely solely on all changes fitting into memoryPoint: Bootstrap Consumer: special application that listens to changes and updates it’s log store (persistent change events) and snapshot store – (persistent copy of the database – stores change events in user-space)Point: Remember , client library automatically switches between appropriate service – relay or bootstrap depending on SCN requested by application - Point: Meta data: used by relay stream (schema awareness) Point: DB’s are saved from abusive scans from lagging consumers , (isolation) Counterpoints:Push-model -> addl state of consumers; or speed of consumption / speed of production have to match ; harder to maintain lossless guarantees;
  11. Are all of these open-sourced? Oracle is.
  12. Custom mysqlrepl /mysql slave instance– the Specifically custom storage engine of slave writes to tcp channel instead of disk – Slave state has an SCN (offset , log number) – that can be controlled – upto 3 days worth of rewindability (configurable)
  13. Control Flow is depicted:Note the pull-model - .> at the CDC end, it’s easy to make data portable – the SCN ,Source is sufficient to re-create state in EventBuffer; easier restarts.; no state requires to be maintained on upstream system about it’s subscribers (as in case of push model).- Publish does not require persistence/durability guarantees – obtained from source of truth and the fact that change stream is a f(Source,SCN) .At end points –the CDC catpures changes from database and publishes to an event bufferAt the other end – applications subscribe to the change stream and receive callbacks when change data from the sources they have subscribed to become availablePoint: End points have API’s supporting transaction semantics (atomicity)Point: ‘Window’ or consistency window are points in the stream that are consistent with source at the specified SCN. Point: Consumers ‘see’ events one consistency window at a time –i.e. they are visible to the consumers after ‘end of window’ has been written.Questions: What if CDC was outside-> CDC can be a pull-model – but can they push-off box to event stream – Yes, but then event stream isn’t simple – cluster state (leader-follower) is shared amongst CDC and Relay. Also failure of CDC leads needs to be treated and monitored separately. Packaging CDC with event stream has operability advantages. Questions: is onRollback() triggered at same time rollback was shown in buffer? – No – this isn’t about one-to-one correspondence in time – but one in semantics - both have notion of ‘transactions’ – apps don’t see events not committed – end users- output of apps have the option of seeing the whole txn in entirety as well – very important for example in relay-chaining.
  14. - Both Relay (Online changes) and Bootstrap (Older changes) together constitute the change event streamDo not share the same exact API – but semantically say the same thingGet events since a point in logical clock Both have ability to perform simple filters on the service side- Both have chunking/progress guarantees-HTTP based implementation-Efficient communication to clients
  15. Database schemas to some neutral format for the databus events , we chose AvroTools are available to publish schema to ‘schema registry’ , schema generation from different source types.Schemas generated and stored in a place accessible by Change Stream – backward compatibility Tools are available to publish schema to ‘schema registry’ , schema generation from different source types.Ensures backward compatibility – relevant for bootstrap.Schemas available to consumers to deserialize.
  16. Relay enacapsulates change capture logic – event buffer (remember the publish API) – implemented as a circular buffer - and the meta dataConstitutes the online – most frequently used part of the change stream – addreses 98% of requests on a typical day.
  17. Relays talk to database directly – since they contain the change capture. This has horizontal scalability limits.
  18. -Bootstrap Consumer : is a special application that consumes events from relays and writes to a persistence layer called ‘log store’ . -Another process applies changes to snapshot store – how – using the pk that was there in the publish API. Separate thread not shown here.-Seeding: bootstrapping the bootstrap
  19. Databus Client Library: -Orchestrate consumption of change stream from bootstrap/relay-Uses a http fetch to get events from upstream / write to eventbuffer using efficient readEvents call-Currently uses polling mechanism to get events from upstream-Dispatcher uses iterator interface of EventBuffer to read the events and then call the user specified consumer implementations.-Client library by default persists SCN for lossless recovery.-Consumers need to be thread-safe, can take advantage of parallelism .… let’s look at a typical application
  20. -Key: a single instance of client library can handle – multiple consumers subscribing to multiple change streams Different logic tuning required for bootstrap and online case - facilities providedSchema aware apps can force type conversion from one schema to another – as long as backward compatibility is preserved amongst change data. -Override of persisted SCN possible : cases where flush() is not guaranteed by the application (e.g. index - ) – so , apps store the SCN in the index; retrieve it on startup-Applications typically are distributed – so they have notion of some sort of partitions/partition awareness. It can be tempting to consume the entire event stream of an unpartitioned upstream store and then drop n-1/n th partition on the floor.It’s inefficient and expensive (for relay and consumer – latency wise as we shall see) – Instead…
  21. - Here- client nodes refer to one instance of the client library – so that can be an application instance-Applications themselves are partition aware – write to partitioned indexes/storesNeed to distribute processing loadPartition function -> applied at source on the primary key - this is applied on the fly – and the source itself neednt’ be partitioned.- Partitions can be changed as more nodes are applied if the application accounts for ‘repartitioning’ . Checkpoints need to be reset, configuration needs to be changed.But this is hardly operation friendly..
  22. The clients are partition aware- but the partition assignment is dynamicCluster awareness is introduced , client app clusters Operability advantages – ability to add/remove nodes with dynamic redistribution -Helix used to manage client clusters , and as SCN store ..….Now, Let’s look at some aspects of the current implementation.
  23. … And on to some code – let’s take a look at the application
  24. -Points to note: - how a source is specified. -How sources are specified-A consumer: - and a databus client uses subscription (register);
  25. Key – show how payload is extracted. The API’s we have visited them earlier. … Now to dwell on performance
  26. Setup:- Measure relay serving throughput and CPU utilization- Vary number of consumers and poll interval tpt_10 means throughput with a poll interval of 10ms, cpu_55 is cpu utilization with 55ms poll interval, etc.)- Consumers pulling at max speed (no additional processing)- Event size is 2.5 KBNo write traffic -- relay buffer pre-filledThe hypothesis was that we can support more consumers if the poll interval is long, and that is confirmed by the Observations:- Relay can easily saturate the network with minimal CPU utilization- Once network is saturated, CPU increases with number of consumers due to networking overhead (context switching)- Even with 200 consumers, CPU utilization is less than 10%- Higher poll intervals generally lead to less CPU utilization
  27. Setup:- Measure read throughput of each consumer with update traffic on the relay- Vary the number of consumers and update rate- Consumers pulling at max speed (no additional processing)- Poll interval is 10 msEvent size is 2.5 KBObservations:- Drops mean consumer no longer being able to keep up- Reason is network saturation on the relay side; e.g. 2000 update/s * 20 consumers * 2.5KB = 100MBps < max network bandwidth < 200MBps = 2000 update/s * 40 consumers * 2.5KB
  28. Setup:- Same as above but measure time in milliseconds for events to reach consumer- Added partitioning through server-side filtering to see what happens if network is not a bottleneckObservations:- Latency knees due to relay network saturation as before- Latency without SSF (Server side filtering) is around 10-20 ms (including an average 5ms overhead due to the poll interval)With SSF network is no longer a bottleneck; latency up to 15-45ms due to SSF computation overheadSo, the relay can scale to hundreds of consumers if they can tolerate a little bit of latency.
  29. E2E latency has no meaning for bootstrap service, and they can easily saturate the network with multiple clients. So, we focused on comparing the serves out log store vs snapshot store.Setup:* Compare serving deltas vs serving all updates* Synthetic work load* Vary number of updates to existing keys vs new keys (i.e. inserts)Observations:* Catchup time is constant as it does not distinguish updates vs inserts* Break even point is around 1:1 updates vs insertsFor a small number of inserts, the benefit of snapshot is overwhelmingThe breakeven point seems to be when ½ of the changes are updates. We monitor the update rate in production and tune the bootstrap service.
  30. Databusstream for Oracle :  things that scale with memberId and things that scale with connections (multiplicative , only inserts)  small sources – advertiser data (but consistency important)Applications:  search – multiple instances – large distributed deployment – low latency requirement – consistency  Bootstrap used in new ways – used to automatically provision new index nodes; new in memory in advertising data sets ; usef to fix legacy stores Espresso: source of truth 2013 and beyondPartitioned primary data store (transactional) based on Mysql store engine ; horizontally scalable ; change stream partitioned at source-of-truth rather than change-stream -> Change stream still requires trigger based ‘databusification’ in Oracle. Relay provisioning is still manual – in the sense – there is no self-serve mechanism of specify a source /automatic source discovery- that will triggerRelays being provisioned in the ‘cloud’ – depending on capacity estimates.Let’s look at some change capture implementations we have.. .
  31. -
  32. Overall: is external clock propagation a good idea overall? Is it necessary or a nice to have? - It becomes important in case of bootstrap ?Are checkpoints portable? If a mapping exists between SCN->CDC-GEN-UNIQ-NUM , or if an index exists at every layer-bootstrap and relay for SCN – then it can be handled as system levelImpl – and the client needn’t use SCN explicitly. SCN – external clock is a convenient way of storing logical state across instances of the change stream.
  33. Point: Change Capture is within the relay: each relay is self-sufficient, i.e. since eventBufferState = fn(source,SCN) – it has change capture logic to pull in the changes – if change capture were outside, then the change capture logic ,fan out has to take care of replication or write to a leader-follower relay clusterPoint: Client Library: fetches changes from the databus stream (which is now Relay and Bootstrap Service) - . Point: Separation of use cases – recent change, to older changes, snapshots – cannot rely solely on all changes fitting into memoryPoint: Bootstrap Consumer: special application that listens to changes and updates it’s log store (persistent change events) and snapshot store – (persistent copy of the database – stores change events in user-space)Point: Remember , client library automatically switches between appropriate service – relay or bootstrap depending on SCN requested by applicationPoint: DB’s are saved from abusive scans from lagging consumers