SlideShare una empresa de Scribd logo
1 de 60
THE UNBUNDLED
DATABASE
Leveraging the
unbundled database
via distributed logs
and stream processing
2|
WhoAm I?
Software and data engineering at
Rackspace Hosting
Software engineering at WDPROData Infrastructure at Pluralsight
3|
Pluralsight
4|
Table of Contents
4 17 22 30 39
page page page page page
52
page
Microservicesoverview EventDrivenServices TheDistributedLog KafkaLogSemantics TheUnbundledDatabase
StreamProcessing
Microservices Background
Challenges
Datadichotomy
Streams
6|
6
Why?
SCALABILITY
7|
Independence
comes at a cost COMPLEXITY
Independenceisadouble-edged sword.
BOUNDARIES
8|
Services depend on
each other
Most business services share the
same notions of core facts.
This makes their futures inevitably
connected.
Over time, services may become
unable to retain the same clear
separation of concerns.
Servicesareinherentlypartofabigger,
interconnectedecosystem.
9|
Data on the “inside”
vs
Data on the “outside”
Data on the inside
Encapsulated private data contained
within a service.
Data on the outside
Information that flows between
independent services.
How do
services
share data?
Three well-known approaches:
Serviceinterfaces
Messaging
Shareddatabases
11|
Service Interfaces
Goal is to clearly separate concerns
between services and define different
bounded contexts.
Synchronized changes are hard!
Dataandfunctionalityareencapsulatedinthe
service.
12|
Messaging
Middleware
Messaging architectures can scale well.
Even though messaging architectures
can move massive amounts of data,
they don’t provide any historical
context, which can lead to data
divergence over time.
Dataandfunctionalityarescatteredacross
organization.
13|
Shared Databases
Shared databases concentrate too
much data in a single place.
For microservices, databases create an
usually strong rich coupling.
This is due to the broad interface that
databases expose to the outside world.
Functionalityisencapsulatedwithintheservice;data
isnot.
14|
Data Dichotomy
Service interfaces minimize the data they
expose to the outside.
Database interfaces, on the other hand,
tend to amplify the data they hold.
Datasystemsareaboutexposingdata.
Servicesareabouthidingit.
interface
Data on the inside
Data on the outside
interface
Data on the inside
Data on the outside
Databases Services
15|
Data Diverges
Overtime
Different services make different
interpretations of the data they
consume, which leads to divergent
information.
Services also keep that data around;
data is altered and fixed locally and soon
it doesn’t represent the original dataset
anymore.
16|
Looking ahead:
Sharing data with
distributed logs
SHOPPINGCART
SERVICE
CATALOG
SERVICE
USER
SERVICE
FULFILLMENT
SERVICE
USERCOMMIT LOG
RETURNS
SERVICE
WRITESTO
REPLICATES
REPLICATES
REPLICATES
REPLICATES
Eventsarebroadcasttoalog.
Event Driven
Services
18|
Ways services
interact
Queries
Requests to look up some data point.
Queries are side effect free and leave
the state of the system unchanged.
Events
Events can be thought of are both a
fact and a trigger.
They express something that has
happened, usually in the form of a
notification.
Commands
Commands are actions in the form of
side effect generating requests
indicating some operation to be
performed by another service.
Commands expect a response.
19|
Event Driven
Services
Broadcast events to a centralized,
immutable stream of facts.
Downstream freedom to react, adapt
and change to any consumer.
There are also some other interesting
gains that event driven services
provide, such as Exactly Once
Processing.
This paradigm is in a departure from
request-driven services, where flow
resides in commands and queries.
IbroadcastwhatIdid!
20|
Advantages
Decoupling
Both data producers and consumers
are completely decoupled. There is
no API binding them together, no
synchronized changes that need to
be performed.
Locality
Queries and lookups are local to the
bounded context and can be
optimized in the best way that fits the
current use case.
State Transfer
Events are both triggers and facts,
that can be used to notify and
propagate entity state transfers.
21|
The Single Writer
Principle
Having a single code path helps with
data quality, consistency, and other
data sharing concerns.
This is important because these
events represent durable shared
facts.
Asingleserviceownsalleventsfora
singletype.
EVENT STREAM/ LOG
MATERIALIZEDVIEWS/CACHE
HADOOP
ETL
SERVICE
TRANSF
Writesto
Replicatesto
The
Distributed
Log
Logconcepts
Kafka
Topicsandpartitions
23|
What’s a log?
Reads and writes are sequential
operations.
They are, therefore, sympathetic to
the underlying media, leveraging
pre-fetch, the various layers of
caching and naturally batching
similar operations together.
Ordered,immutablesequenceofrecordsthatis
continuouslyappendedto.
24|
Writing to the Log
Data is stored in the log as stream of
bytes.
Due to their structure, logs can be
optimized.
For instance, when writing data to
Kafka, data is copied directly from the
disk buffer to the network buffer,
without any memory cache.
Writesareappendonly,alwaysaddedtotheheadof
thelog.
25|
Reading from the
Log
Both reads and writes are sequential
operations.
Messages are read in the order they
were written.
Consumers are responsible for
periodically recording their position
in the log.
Since the log is durable, messages
can be replayed for as long as they
exist in the log.
Readsareperformedbyseekingtoaspecific
positionandsequentiallyscanning.
Seek to offset and SCAN
Only sequential access
26|
Kafka
Adistributedstreamingplatform
27|
Key Capabilities
Storage
Stores streams of records using
replicated, fault-tolerant, durable
mechanisms.
Persists all published records —
whether or not they have been
consumed, using a configurable
retention period.
Processing
Kafka can process and apply logic to
streams of records as they occur.
Publish / Subscribe
Kafka is like a message queue or
enterprise messaging system, but
with some very distinct design
concerns and side effects.
28|
The Kafka Broker
Resilient
Retries, message acknowledgement,
ack strategies, are all baked into the
platform.
Fault Tolerant
Messages are replicated across different
nodes.
Linearly Scalable
Scaling is a matter of adding more
nodes to an existing cluster.
Rebalancing, leader election, and
replication are automatically adjusted.
BROKER BROKER BROKER BROKER
29|
Topics and
Partitions
Topics
Split into ordered commit logs called
partitions.
Data in a topic is retained for a
configurable period of time.
Partitions
Each message is assigned a
sequential id called an offset.
Allow the logs to scale beyond a size
that will fit a single broker.
Act as the unit of parallelism.
Topicsarecategoriesorfeednamestowhich
recordsarepublishedto.
Kafka Log
Semantics
Orderingguarantees
Messagedurability
Loadbalancing
Compaction
Storage
Topictypes
31|
Ordering guarantees
Relative Ordering
Messages that require relative
ordering must be sent to the same
partition.
Message keys will map the same
partition.
Global Ordering
Requires a single partition topic.
Tends to come up when migrating
legacy systems where global ordering
was an assumption.
Throughput limited to a single
machine.
Mostbusinesssystemsneedstrongordering
guarantees.
Service
Service
Keys map to the partitions
Consumer
Consumer
Consumer
Group
Consumers in a group are responsible for a
single partition, so ordering is guaranteed.
32|
Message Durability
Messages are written to a leader and
then replicated to a user-defined
number of brokers.
Records can be configured to be
persisted for a period of time or
based on keys.
Kafkaprovidesdurabilitythroughreplication.
Producer
33|
Kafka can load
balance services
If a consumer leaves a group for any
reason, Kafka will detect this change
and re-balance how messages are
distributed across the remaining
consumers.
If the failed consumer comes back
online, load is balanced again.
Kafka assigns whole partitions to
different consumers.
In other words, a single partition can
only ever be assigned to a single
consumer.
Since this is always true, ordering is
guaranteed, across failures and
restarts.
Loadbalancingprovideshighavailability
Consumer
Consumer
Consumer
Group
34|
Compaction
‘Compacted Topics’ retain only the
most recent events, with any old
events, for a certain key removed.
They also support deletes.
Compacted tops reduce how quickly
a dataset grows, reducing storage
requirements while also increasing
performance of replication jobs.
Key-baseddatasetscanbecompacted
35|
Topic Durability
If you set the retention to a ”forever”
or enable log compaction on a topic,
data will be kept for all time.
Some use cases that support this are
event sourcing, in-memory caches
(with compacted topics), stream
processing or change data capture.
Kafkacanbeusedasalong-termstoragelayer
36|
Public and Private
Topics
Some teams prefer to do this by
convention, but a stricter segregation
can be applied using the
authorization interface.
Assign read/write permissions for
private topics only to the services
that own them.
Publicandprivatetopicsshouldbeseparatedfrom
eachother.
Public Topics
Private Topics
Processors
37|
Use a Schema
Registry
A schema registry provides a
centralized repository for stream
metadata.
Having a schema registry helps with
data management, data discovery
and automatic data pipelines.
A schema registry can also help with
proper guards around schema
evolution, caching, storage and
computation efficiency.
Alwaysuseschemastopromoteadurable,shareable
contractbetweenproducersandconsumers.
38|
Serializing
Messages withAvro
Why Avro?
Avro is a rich library and messaging
protocol that supports direct mapping
to/from JSON.
It is also space efficient and fast, with wide
industry support across different
languages.
Avro can also support automated
pipelines for data replication.
Avro records are also evolvable.
Opensourcedataserializationprotocolthathelps
withdataexchangebetweensystems,programming
languages,andprocessingframeworks.
The
Unbundled
Database
40|
40
Databases
“We have been using the database as a kind of gigantic, global,
shared, mutable state. It’s like a global variable that’s shared
between all your application servers.”
- Martin Kleppmann
41|
Transactions
Transactions appears to run in isolation,
completely separated from each other.
If any part of the system fails, each
transaction is either executed in its
entirety or not all.
AsequenceofoneormoreSQLoperationsare
treatedasasingleoperation.
42|
WHYACID?
What consistency
do you really need
and when?
ACID is old school!
43|
ACID 2.0
Idempotent
aa = a
map.put("key1",
"value1").put("key1", "value1")
always result in a single entry.
Distributed
(ab)c = (ac)b, for concurrent c
and b
Mostly symbolical.
Commutative
ab = ba
max(1, 2) = max(2, 1)
Associative
a(bc) = abc
Set(a).add(b) = set(b).add(a)
44|
Indexes
Indexes quickly locate data without
having to scan every row in a database
table, but incur the cost of additional
writes and storage space to maintain
the index data structure.
Datastructurethatincreasesthespeedofdata
lookupoperations.
45|
Views
Database users can query views just as
they would any other persistent
database object.
Views can be materialized or virtual.
Theresultsetofastoredqueryonthedata
46|
Materialized Views
Whenever any of the underlying data
changes, the materialized view is
updated too.
A view precomputes the query to get the
data in exactly the right form for your
use case. When it comes to querying the
view, all the hard stuff is already done.
Querycachedandrunbythedatabase
47|
Databases shouldn’t
be shared
Databases introduce an unusually
strong type of coupling.
This comes from the broad, magnifying
interface that these systems expose to
the outside world.
As services interact with the rich
interface provided by databases, they
get sucked in. Service and database
becoming increasingly intertwined.
Databases are pools of global, shared mutable state.
48|
The Unbundled
Database
A database is composed of several
concepts rolled into a single logical
unit: storage, indexing, caching,
query, and transactions.
Unbundling is breaking out these
components and recomposing in a
way that is more sympathetic to the
target system.
Splittingdatabaseconcernsintodifferentlayers.
49|
Rethinking the
Materialized View
We will need:
1. The ability to write data
transactionally into a log that maintains
immutable records of these writes.
2. A query engine that replicates the
journal into a view that can be queried.
All these elements need to be
decentralized and operate as
independent entities.
Kafka can handle both the log structure
and atomic writes.
Stream processing engines are a great
fit for the role of the query engine.
.
Canwethinkofmaterializedviewsascontinuously
updatedcaches?
50|
Unbundled
Databases are safe
to share
The loose coupling comes from the
simple interface the log provides:
seek and scan.
This makes the dominant source of
coupling the data itself, freeing both
sides of the equation from tight
coupling.
Kafka becomes the centralized event
stream log.Thelogprovidesasafe,immutable,low-coupling
mechanismtosharedatasetsacrossservices.
Service
Kafka
Data
Create using some
Query
Stream Processing
Query is always running
51|
Creating Service
specific views
These materialized views don’t need
to be long-lived.
A distributed log ‘remembers’, which
means the views don’t have to.
Views can be ephemeral,
implemented as simple caches that
can be thrown away and rebuilt at
anytime from the log.
Unbundleddatabasesofferanapproachwheretrade
offsbetweenreadsandwritesarenolongerneeded.
Service
Many different “custom” views.
Service
Service
Service
Stream
Processing
53|
Remember…
Services broadcast events
Services are not modeled as a
collection of remote requests and
commands.
They become a cascade of
notifications, decoupling each event
source from its downstream
destination.
Events are triggers and facts
Events make up a narrative that not
only describes the evolution of the
business domains over time, they
also represent full datasets.
54|
Kafka Streams
Low overhead.
Stream processing meant to run as
part of the application.
Built for reading data from Kafka
topics, processing it, and writing the
results to back to Kafka.
Uses the Kafka cluster for
coordination, load balancing,
and fault-tolerance.
KSQL: SQL-like interface.
Ideal for ETL (KStreams) and
aggregations (KTables).
LibrarythatanystandardJavaapplicationcan
embed.
55|
Akka Streams
Based on Akka actors.
Designed for general purpose
microservices.
Very low latency.
Mid volume, complex data pipelines.
Efficient per-event processing
Very rich ecosystem:
Alpakka - connect to almost
everything! (dbs, files, etc.)
Akka cluster and Akka persistence
Scala/Javaimplementationofreactivestreams
56|
Apache Flink
Operator-based computational
model.
Large scale.
Automatic data partitioning.
Exactly-once processing.
Very low latency.
Handles high data volumes: 1M/sec.
Can run batch jobs.
Clusteredstreamprocessingengine.
57|
Apache Spark
Micro batches computing model.
Large scale and automatic
partitioning.
Medium latency, evolving to low.
Handles high data volumes: 1M/sec
Rich SQL, ML options.
Very mature; huge community
support and drivers across a variety
of programming languages.
Clustercomputingframeworkwiththelargestglobal
userbase.WritteninScala,Java,RandPython.
58|
Use Case:
Replication Streams
Builds materialized views from the
writes in the log.
Replication stream behaves like a
database transaction log.
Views are like database secondary
indexes: optimized for querying and
reading.
There could be many different shapes
of the same data: a key-value store, a
full-text search index, a graph index,
an analytics system, and so on.
Createsin-sync,read-onlymaterializedviewsofthe
underlyingloginaformatthat’smostsympatheticto
thetargetsystem.
59|
Even more
generic…
Source: http://office.microsoft.com/en-us/excel-help/present-your-data-in-a-column-chart-HA010218663.aspx
MaterializedViewsandStream
Processing
HYDRA
BROKER
INGESTION
Customer
HYDRA
STREAMDISPATCH
{ }
/dsls
Invoices Returns
contact information
alex-silva@pluralsight.com
thank you
http://linkedin.com/in/alexvsilva
@thealexsilva

Más contenido relacionado

La actualidad más candente

Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency i...
Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency i...Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency i...
Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency i...Legacy Typesafe (now Lightbend)
 
High-Speed Reactive Microservices
High-Speed Reactive MicroservicesHigh-Speed Reactive Microservices
High-Speed Reactive MicroservicesRick Hightower
 
The Data Dichotomy- Rethinking the Way We Treat Data and Services
The Data Dichotomy- Rethinking the Way We Treat Data and ServicesThe Data Dichotomy- Rethinking the Way We Treat Data and Services
The Data Dichotomy- Rethinking the Way We Treat Data and Servicesconfluent
 
Akka for big data developers
Akka for big data developersAkka for big data developers
Akka for big data developersTaras Fedorov
 
Redis and Kafka - Simplifying Advanced Design Patterns within Microservices A...
Redis and Kafka - Simplifying Advanced Design Patterns within Microservices A...Redis and Kafka - Simplifying Advanced Design Patterns within Microservices A...
Redis and Kafka - Simplifying Advanced Design Patterns within Microservices A...HostedbyConfluent
 
What is Reactive programming?
What is Reactive programming?What is Reactive programming?
What is Reactive programming?Kevin Webber
 
Data Consitency Patterns in Cloud Native Applications
Data Consitency Patterns in Cloud Native ApplicationsData Consitency Patterns in Cloud Native Applications
Data Consitency Patterns in Cloud Native ApplicationsRyan Knight
 
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드confluent
 
Dataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice WayDataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice WayJosef Adersberger
 
Microservices Antipatterns
Microservices AntipatternsMicroservices Antipatterns
Microservices AntipatternsC4Media
 
Microservices Testing Strategies JUnit Cucumber Mockito Pact
Microservices Testing Strategies JUnit Cucumber Mockito PactMicroservices Testing Strategies JUnit Cucumber Mockito Pact
Microservices Testing Strategies JUnit Cucumber Mockito PactAraf Karsh Hamid
 
How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On Kubernetes
How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On KubernetesHow To Build, Integrate, and Deploy Real-Time Streaming Pipelines On Kubernetes
How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On KubernetesLightbend
 
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...confluent
 
Nats meetup sf 20150826
Nats meetup sf   20150826Nats meetup sf   20150826
Nats meetup sf 20150826Apcera
 
How easy (or hard) it is to monitor your graph ql service performance
How easy (or hard) it is to monitor your graph ql service performanceHow easy (or hard) it is to monitor your graph ql service performance
How easy (or hard) it is to monitor your graph ql service performanceRed Hat
 
Building Information Systems using Event Modeling (Bobby Calderwood, Evident ...
Building Information Systems using Event Modeling (Bobby Calderwood, Evident ...Building Information Systems using Event Modeling (Bobby Calderwood, Evident ...
Building Information Systems using Event Modeling (Bobby Calderwood, Evident ...confluent
 
APAC Kafka Summit - Best Of
APAC Kafka Summit - Best Of APAC Kafka Summit - Best Of
APAC Kafka Summit - Best Of confluent
 
NoSql presentation
NoSql presentationNoSql presentation
NoSql presentationMat Wall
 

La actualidad más candente (20)

Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency i...
Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency i...Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency i...
Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency i...
 
High-Speed Reactive Microservices
High-Speed Reactive MicroservicesHigh-Speed Reactive Microservices
High-Speed Reactive Microservices
 
The Data Dichotomy- Rethinking the Way We Treat Data and Services
The Data Dichotomy- Rethinking the Way We Treat Data and ServicesThe Data Dichotomy- Rethinking the Way We Treat Data and Services
The Data Dichotomy- Rethinking the Way We Treat Data and Services
 
Akka for big data developers
Akka for big data developersAkka for big data developers
Akka for big data developers
 
Redis and Kafka - Simplifying Advanced Design Patterns within Microservices A...
Redis and Kafka - Simplifying Advanced Design Patterns within Microservices A...Redis and Kafka - Simplifying Advanced Design Patterns within Microservices A...
Redis and Kafka - Simplifying Advanced Design Patterns within Microservices A...
 
What is Reactive programming?
What is Reactive programming?What is Reactive programming?
What is Reactive programming?
 
Data Consitency Patterns in Cloud Native Applications
Data Consitency Patterns in Cloud Native ApplicationsData Consitency Patterns in Cloud Native Applications
Data Consitency Patterns in Cloud Native Applications
 
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
 
Dataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice WayDataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice Way
 
KrakenD API Gateway
KrakenD API GatewayKrakenD API Gateway
KrakenD API Gateway
 
Microservices Antipatterns
Microservices AntipatternsMicroservices Antipatterns
Microservices Antipatterns
 
Microservices Testing Strategies JUnit Cucumber Mockito Pact
Microservices Testing Strategies JUnit Cucumber Mockito PactMicroservices Testing Strategies JUnit Cucumber Mockito Pact
Microservices Testing Strategies JUnit Cucumber Mockito Pact
 
How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On Kubernetes
How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On KubernetesHow To Build, Integrate, and Deploy Real-Time Streaming Pipelines On Kubernetes
How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On Kubernetes
 
Microservices deck
Microservices deckMicroservices deck
Microservices deck
 
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
 
Nats meetup sf 20150826
Nats meetup sf   20150826Nats meetup sf   20150826
Nats meetup sf 20150826
 
How easy (or hard) it is to monitor your graph ql service performance
How easy (or hard) it is to monitor your graph ql service performanceHow easy (or hard) it is to monitor your graph ql service performance
How easy (or hard) it is to monitor your graph ql service performance
 
Building Information Systems using Event Modeling (Bobby Calderwood, Evident ...
Building Information Systems using Event Modeling (Bobby Calderwood, Evident ...Building Information Systems using Event Modeling (Bobby Calderwood, Evident ...
Building Information Systems using Event Modeling (Bobby Calderwood, Evident ...
 
APAC Kafka Summit - Best Of
APAC Kafka Summit - Best Of APAC Kafka Summit - Best Of
APAC Kafka Summit - Best Of
 
NoSql presentation
NoSql presentationNoSql presentation
NoSql presentation
 

Similar a Leveraging the power of the unbundled database

Kafka-and-event-driven-architecture-OGYatra20.ppt
Kafka-and-event-driven-architecture-OGYatra20.pptKafka-and-event-driven-architecture-OGYatra20.ppt
Kafka-and-event-driven-architecture-OGYatra20.pptInam Bukhary
 
Kafka and event driven architecture -og yatra20
Kafka and event driven architecture -og yatra20Kafka and event driven architecture -og yatra20
Kafka and event driven architecture -og yatra20Vinay Kumar
 
Kafka and event driven architecture -apacoug20
Kafka and event driven architecture -apacoug20Kafka and event driven architecture -apacoug20
Kafka and event driven architecture -apacoug20Vinay Kumar
 
Whitepaper : Building an Efficient Microservices Architecture
Whitepaper : Building an Efficient Microservices ArchitectureWhitepaper : Building an Efficient Microservices Architecture
Whitepaper : Building an Efficient Microservices ArchitectureNewt Global Consulting LLC
 
Pivotal gem fire_wp_hardest-problems-data-management_053013
Pivotal gem fire_wp_hardest-problems-data-management_053013Pivotal gem fire_wp_hardest-problems-data-management_053013
Pivotal gem fire_wp_hardest-problems-data-management_053013EMC
 
Scalable Fault-tolerant microservices
Scalable Fault-tolerant microservicesScalable Fault-tolerant microservices
Scalable Fault-tolerant microservicesMahesh Veerabathiran
 
Designing distributed systems
Designing distributed systemsDesigning distributed systems
Designing distributed systemsMalisa Ncube
 
A scalable and reliable matching service slide
A scalable and reliable matching service slideA scalable and reliable matching service slide
A scalable and reliable matching service slidesomnath goud
 
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...apidays
 
A Study on Replication and Failover Cluster to Maximize System Uptime
A Study on Replication and Failover Cluster to Maximize System UptimeA Study on Replication and Failover Cluster to Maximize System Uptime
A Study on Replication and Failover Cluster to Maximize System UptimeYogeshIJTSRD
 
Event Driven Software Architecture Pattern
Event Driven Software Architecture PatternEvent Driven Software Architecture Pattern
Event Driven Software Architecture Patternjeetendra mandal
 
Implementing Domain Events with Kafka
Implementing Domain Events with KafkaImplementing Domain Events with Kafka
Implementing Domain Events with KafkaAndrei Rugina
 
A scalable and reliable matching service for content based publish subscribe ...
A scalable and reliable matching service for content based publish subscribe ...A scalable and reliable matching service for content based publish subscribe ...
A scalable and reliable matching service for content based publish subscribe ...somnath goud
 
Stateful on Stateless - The Future of Applications in the Cloud
Stateful on Stateless - The Future of Applications in the CloudStateful on Stateless - The Future of Applications in the Cloud
Stateful on Stateless - The Future of Applications in the CloudMarkus Eisele
 
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationRemoving performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationKnoldus Inc.
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQShameera Rathnayaka
 
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...HostedbyConfluent
 
Postponed Optimized Report Recovery under Lt Based Cloud Memory
Postponed Optimized Report Recovery under Lt Based Cloud MemoryPostponed Optimized Report Recovery under Lt Based Cloud Memory
Postponed Optimized Report Recovery under Lt Based Cloud MemoryIJARIIT
 
Reactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/SubscribeReactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/SubscribeSumant Tambe
 

Similar a Leveraging the power of the unbundled database (20)

Kafka-and-event-driven-architecture-OGYatra20.ppt
Kafka-and-event-driven-architecture-OGYatra20.pptKafka-and-event-driven-architecture-OGYatra20.ppt
Kafka-and-event-driven-architecture-OGYatra20.ppt
 
Kafka and event driven architecture -og yatra20
Kafka and event driven architecture -og yatra20Kafka and event driven architecture -og yatra20
Kafka and event driven architecture -og yatra20
 
Kafka and event driven architecture -apacoug20
Kafka and event driven architecture -apacoug20Kafka and event driven architecture -apacoug20
Kafka and event driven architecture -apacoug20
 
Whitepaper : Building an Efficient Microservices Architecture
Whitepaper : Building an Efficient Microservices ArchitectureWhitepaper : Building an Efficient Microservices Architecture
Whitepaper : Building an Efficient Microservices Architecture
 
Pivotal gem fire_wp_hardest-problems-data-management_053013
Pivotal gem fire_wp_hardest-problems-data-management_053013Pivotal gem fire_wp_hardest-problems-data-management_053013
Pivotal gem fire_wp_hardest-problems-data-management_053013
 
Scalable Fault-tolerant microservices
Scalable Fault-tolerant microservicesScalable Fault-tolerant microservices
Scalable Fault-tolerant microservices
 
Designing distributed systems
Designing distributed systemsDesigning distributed systems
Designing distributed systems
 
A scalable and reliable matching service slide
A scalable and reliable matching service slideA scalable and reliable matching service slide
A scalable and reliable matching service slide
 
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
 
A Study on Replication and Failover Cluster to Maximize System Uptime
A Study on Replication and Failover Cluster to Maximize System UptimeA Study on Replication and Failover Cluster to Maximize System Uptime
A Study on Replication and Failover Cluster to Maximize System Uptime
 
Event Driven Software Architecture Pattern
Event Driven Software Architecture PatternEvent Driven Software Architecture Pattern
Event Driven Software Architecture Pattern
 
Implementing Domain Events with Kafka
Implementing Domain Events with KafkaImplementing Domain Events with Kafka
Implementing Domain Events with Kafka
 
A scalable and reliable matching service for content based publish subscribe ...
A scalable and reliable matching service for content based publish subscribe ...A scalable and reliable matching service for content based publish subscribe ...
A scalable and reliable matching service for content based publish subscribe ...
 
Stateful on Stateless - The Future of Applications in the Cloud
Stateful on Stateless - The Future of Applications in the CloudStateful on Stateless - The Future of Applications in the Cloud
Stateful on Stateless - The Future of Applications in the Cloud
 
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationRemoving performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configuration
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
 
Virtualization in Distributed System: A Brief Overview
Virtualization in Distributed System: A Brief OverviewVirtualization in Distributed System: A Brief Overview
Virtualization in Distributed System: A Brief Overview
 
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
 
Postponed Optimized Report Recovery under Lt Based Cloud Memory
Postponed Optimized Report Recovery under Lt Based Cloud MemoryPostponed Optimized Report Recovery under Lt Based Cloud Memory
Postponed Optimized Report Recovery under Lt Based Cloud Memory
 
Reactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/SubscribeReactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/Subscribe
 

Último

BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxalwaysnagaraju26
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456KiaraTiradoMicha
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfayushiqss
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 

Último (20)

BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 

Leveraging the power of the unbundled database

  • 1. THE UNBUNDLED DATABASE Leveraging the unbundled database via distributed logs and stream processing
  • 2. 2| WhoAm I? Software and data engineering at Rackspace Hosting Software engineering at WDPROData Infrastructure at Pluralsight
  • 4. 4| Table of Contents 4 17 22 30 39 page page page page page 52 page Microservicesoverview EventDrivenServices TheDistributedLog KafkaLogSemantics TheUnbundledDatabase StreamProcessing
  • 7. 7| Independence comes at a cost COMPLEXITY Independenceisadouble-edged sword. BOUNDARIES
  • 8. 8| Services depend on each other Most business services share the same notions of core facts. This makes their futures inevitably connected. Over time, services may become unable to retain the same clear separation of concerns. Servicesareinherentlypartofabigger, interconnectedecosystem.
  • 9. 9| Data on the “inside” vs Data on the “outside” Data on the inside Encapsulated private data contained within a service. Data on the outside Information that flows between independent services.
  • 10. How do services share data? Three well-known approaches: Serviceinterfaces Messaging Shareddatabases
  • 11. 11| Service Interfaces Goal is to clearly separate concerns between services and define different bounded contexts. Synchronized changes are hard! Dataandfunctionalityareencapsulatedinthe service.
  • 12. 12| Messaging Middleware Messaging architectures can scale well. Even though messaging architectures can move massive amounts of data, they don’t provide any historical context, which can lead to data divergence over time. Dataandfunctionalityarescatteredacross organization.
  • 13. 13| Shared Databases Shared databases concentrate too much data in a single place. For microservices, databases create an usually strong rich coupling. This is due to the broad interface that databases expose to the outside world. Functionalityisencapsulatedwithintheservice;data isnot.
  • 14. 14| Data Dichotomy Service interfaces minimize the data they expose to the outside. Database interfaces, on the other hand, tend to amplify the data they hold. Datasystemsareaboutexposingdata. Servicesareabouthidingit. interface Data on the inside Data on the outside interface Data on the inside Data on the outside Databases Services
  • 15. 15| Data Diverges Overtime Different services make different interpretations of the data they consume, which leads to divergent information. Services also keep that data around; data is altered and fixed locally and soon it doesn’t represent the original dataset anymore.
  • 16. 16| Looking ahead: Sharing data with distributed logs SHOPPINGCART SERVICE CATALOG SERVICE USER SERVICE FULFILLMENT SERVICE USERCOMMIT LOG RETURNS SERVICE WRITESTO REPLICATES REPLICATES REPLICATES REPLICATES Eventsarebroadcasttoalog.
  • 18. 18| Ways services interact Queries Requests to look up some data point. Queries are side effect free and leave the state of the system unchanged. Events Events can be thought of are both a fact and a trigger. They express something that has happened, usually in the form of a notification. Commands Commands are actions in the form of side effect generating requests indicating some operation to be performed by another service. Commands expect a response.
  • 19. 19| Event Driven Services Broadcast events to a centralized, immutable stream of facts. Downstream freedom to react, adapt and change to any consumer. There are also some other interesting gains that event driven services provide, such as Exactly Once Processing. This paradigm is in a departure from request-driven services, where flow resides in commands and queries. IbroadcastwhatIdid!
  • 20. 20| Advantages Decoupling Both data producers and consumers are completely decoupled. There is no API binding them together, no synchronized changes that need to be performed. Locality Queries and lookups are local to the bounded context and can be optimized in the best way that fits the current use case. State Transfer Events are both triggers and facts, that can be used to notify and propagate entity state transfers.
  • 21. 21| The Single Writer Principle Having a single code path helps with data quality, consistency, and other data sharing concerns. This is important because these events represent durable shared facts. Asingleserviceownsalleventsfora singletype. EVENT STREAM/ LOG MATERIALIZEDVIEWS/CACHE HADOOP ETL SERVICE TRANSF Writesto Replicatesto
  • 23. 23| What’s a log? Reads and writes are sequential operations. They are, therefore, sympathetic to the underlying media, leveraging pre-fetch, the various layers of caching and naturally batching similar operations together. Ordered,immutablesequenceofrecordsthatis continuouslyappendedto.
  • 24. 24| Writing to the Log Data is stored in the log as stream of bytes. Due to their structure, logs can be optimized. For instance, when writing data to Kafka, data is copied directly from the disk buffer to the network buffer, without any memory cache. Writesareappendonly,alwaysaddedtotheheadof thelog.
  • 25. 25| Reading from the Log Both reads and writes are sequential operations. Messages are read in the order they were written. Consumers are responsible for periodically recording their position in the log. Since the log is durable, messages can be replayed for as long as they exist in the log. Readsareperformedbyseekingtoaspecific positionandsequentiallyscanning. Seek to offset and SCAN Only sequential access
  • 27. 27| Key Capabilities Storage Stores streams of records using replicated, fault-tolerant, durable mechanisms. Persists all published records — whether or not they have been consumed, using a configurable retention period. Processing Kafka can process and apply logic to streams of records as they occur. Publish / Subscribe Kafka is like a message queue or enterprise messaging system, but with some very distinct design concerns and side effects.
  • 28. 28| The Kafka Broker Resilient Retries, message acknowledgement, ack strategies, are all baked into the platform. Fault Tolerant Messages are replicated across different nodes. Linearly Scalable Scaling is a matter of adding more nodes to an existing cluster. Rebalancing, leader election, and replication are automatically adjusted. BROKER BROKER BROKER BROKER
  • 29. 29| Topics and Partitions Topics Split into ordered commit logs called partitions. Data in a topic is retained for a configurable period of time. Partitions Each message is assigned a sequential id called an offset. Allow the logs to scale beyond a size that will fit a single broker. Act as the unit of parallelism. Topicsarecategoriesorfeednamestowhich recordsarepublishedto.
  • 31. 31| Ordering guarantees Relative Ordering Messages that require relative ordering must be sent to the same partition. Message keys will map the same partition. Global Ordering Requires a single partition topic. Tends to come up when migrating legacy systems where global ordering was an assumption. Throughput limited to a single machine. Mostbusinesssystemsneedstrongordering guarantees. Service Service Keys map to the partitions Consumer Consumer Consumer Group Consumers in a group are responsible for a single partition, so ordering is guaranteed.
  • 32. 32| Message Durability Messages are written to a leader and then replicated to a user-defined number of brokers. Records can be configured to be persisted for a period of time or based on keys. Kafkaprovidesdurabilitythroughreplication. Producer
  • 33. 33| Kafka can load balance services If a consumer leaves a group for any reason, Kafka will detect this change and re-balance how messages are distributed across the remaining consumers. If the failed consumer comes back online, load is balanced again. Kafka assigns whole partitions to different consumers. In other words, a single partition can only ever be assigned to a single consumer. Since this is always true, ordering is guaranteed, across failures and restarts. Loadbalancingprovideshighavailability Consumer Consumer Consumer Group
  • 34. 34| Compaction ‘Compacted Topics’ retain only the most recent events, with any old events, for a certain key removed. They also support deletes. Compacted tops reduce how quickly a dataset grows, reducing storage requirements while also increasing performance of replication jobs. Key-baseddatasetscanbecompacted
  • 35. 35| Topic Durability If you set the retention to a ”forever” or enable log compaction on a topic, data will be kept for all time. Some use cases that support this are event sourcing, in-memory caches (with compacted topics), stream processing or change data capture. Kafkacanbeusedasalong-termstoragelayer
  • 36. 36| Public and Private Topics Some teams prefer to do this by convention, but a stricter segregation can be applied using the authorization interface. Assign read/write permissions for private topics only to the services that own them. Publicandprivatetopicsshouldbeseparatedfrom eachother. Public Topics Private Topics Processors
  • 37. 37| Use a Schema Registry A schema registry provides a centralized repository for stream metadata. Having a schema registry helps with data management, data discovery and automatic data pipelines. A schema registry can also help with proper guards around schema evolution, caching, storage and computation efficiency. Alwaysuseschemastopromoteadurable,shareable contractbetweenproducersandconsumers.
  • 38. 38| Serializing Messages withAvro Why Avro? Avro is a rich library and messaging protocol that supports direct mapping to/from JSON. It is also space efficient and fast, with wide industry support across different languages. Avro can also support automated pipelines for data replication. Avro records are also evolvable. Opensourcedataserializationprotocolthathelps withdataexchangebetweensystems,programming languages,andprocessingframeworks.
  • 40. 40| 40 Databases “We have been using the database as a kind of gigantic, global, shared, mutable state. It’s like a global variable that’s shared between all your application servers.” - Martin Kleppmann
  • 41. 41| Transactions Transactions appears to run in isolation, completely separated from each other. If any part of the system fails, each transaction is either executed in its entirety or not all. AsequenceofoneormoreSQLoperationsare treatedasasingleoperation.
  • 42. 42| WHYACID? What consistency do you really need and when? ACID is old school!
  • 43. 43| ACID 2.0 Idempotent aa = a map.put("key1", "value1").put("key1", "value1") always result in a single entry. Distributed (ab)c = (ac)b, for concurrent c and b Mostly symbolical. Commutative ab = ba max(1, 2) = max(2, 1) Associative a(bc) = abc Set(a).add(b) = set(b).add(a)
  • 44. 44| Indexes Indexes quickly locate data without having to scan every row in a database table, but incur the cost of additional writes and storage space to maintain the index data structure. Datastructurethatincreasesthespeedofdata lookupoperations.
  • 45. 45| Views Database users can query views just as they would any other persistent database object. Views can be materialized or virtual. Theresultsetofastoredqueryonthedata
  • 46. 46| Materialized Views Whenever any of the underlying data changes, the materialized view is updated too. A view precomputes the query to get the data in exactly the right form for your use case. When it comes to querying the view, all the hard stuff is already done. Querycachedandrunbythedatabase
  • 47. 47| Databases shouldn’t be shared Databases introduce an unusually strong type of coupling. This comes from the broad, magnifying interface that these systems expose to the outside world. As services interact with the rich interface provided by databases, they get sucked in. Service and database becoming increasingly intertwined. Databases are pools of global, shared mutable state.
  • 48. 48| The Unbundled Database A database is composed of several concepts rolled into a single logical unit: storage, indexing, caching, query, and transactions. Unbundling is breaking out these components and recomposing in a way that is more sympathetic to the target system. Splittingdatabaseconcernsintodifferentlayers.
  • 49. 49| Rethinking the Materialized View We will need: 1. The ability to write data transactionally into a log that maintains immutable records of these writes. 2. A query engine that replicates the journal into a view that can be queried. All these elements need to be decentralized and operate as independent entities. Kafka can handle both the log structure and atomic writes. Stream processing engines are a great fit for the role of the query engine. . Canwethinkofmaterializedviewsascontinuously updatedcaches?
  • 50. 50| Unbundled Databases are safe to share The loose coupling comes from the simple interface the log provides: seek and scan. This makes the dominant source of coupling the data itself, freeing both sides of the equation from tight coupling. Kafka becomes the centralized event stream log.Thelogprovidesasafe,immutable,low-coupling mechanismtosharedatasetsacrossservices. Service Kafka Data Create using some Query Stream Processing Query is always running
  • 51. 51| Creating Service specific views These materialized views don’t need to be long-lived. A distributed log ‘remembers’, which means the views don’t have to. Views can be ephemeral, implemented as simple caches that can be thrown away and rebuilt at anytime from the log. Unbundleddatabasesofferanapproachwheretrade offsbetweenreadsandwritesarenolongerneeded. Service Many different “custom” views. Service Service Service
  • 53. 53| Remember… Services broadcast events Services are not modeled as a collection of remote requests and commands. They become a cascade of notifications, decoupling each event source from its downstream destination. Events are triggers and facts Events make up a narrative that not only describes the evolution of the business domains over time, they also represent full datasets.
  • 54. 54| Kafka Streams Low overhead. Stream processing meant to run as part of the application. Built for reading data from Kafka topics, processing it, and writing the results to back to Kafka. Uses the Kafka cluster for coordination, load balancing, and fault-tolerance. KSQL: SQL-like interface. Ideal for ETL (KStreams) and aggregations (KTables). LibrarythatanystandardJavaapplicationcan embed.
  • 55. 55| Akka Streams Based on Akka actors. Designed for general purpose microservices. Very low latency. Mid volume, complex data pipelines. Efficient per-event processing Very rich ecosystem: Alpakka - connect to almost everything! (dbs, files, etc.) Akka cluster and Akka persistence Scala/Javaimplementationofreactivestreams
  • 56. 56| Apache Flink Operator-based computational model. Large scale. Automatic data partitioning. Exactly-once processing. Very low latency. Handles high data volumes: 1M/sec. Can run batch jobs. Clusteredstreamprocessingengine.
  • 57. 57| Apache Spark Micro batches computing model. Large scale and automatic partitioning. Medium latency, evolving to low. Handles high data volumes: 1M/sec Rich SQL, ML options. Very mature; huge community support and drivers across a variety of programming languages. Clustercomputingframeworkwiththelargestglobal userbase.WritteninScala,Java,RandPython.
  • 58. 58| Use Case: Replication Streams Builds materialized views from the writes in the log. Replication stream behaves like a database transaction log. Views are like database secondary indexes: optimized for querying and reading. There could be many different shapes of the same data: a key-value store, a full-text search index, a graph index, an analytics system, and so on. Createsin-sync,read-onlymaterializedviewsofthe underlyingloginaformatthat’smostsympatheticto thetargetsystem.

Notas del editor

  1. Technology learning platform. As a company, we are strong believers in creating progress through technology and we do this with a learning platform that features assessments, learning paths and courses created by industry experts Subscription model
  2. If you were to stumble upon the whole microservices thing, without any prior context, you’d be forgiven for thinking it a little strange. Taking an application and splitting it into fragments, separated by a network, inevitably means injecting the complex failure modes of a distributed system.
  3. What? “Microservices are small, autonomous services that work together.” Major traits: Small in size Messaging enabled Bounded by contexts Autonomously developed and deployable Decentralized Built and released with automated processes Why? Microservices are independently deployable. It is this attribute, more than any other, that give them their value.  It allows them to scale. To grow. Not so much in the sense of scaling to quadrillions of users or petabytes of data (although they may help with that), but rather scaling in people terms, as your teams and organization grow.
  4. Independence itself is a double edged sword. It means a service can iterate quickly and freely. But if one service implements a feature which requires another service to change, we end up having to make changes to both services at around the same time. Whilst this is easy in a monolithic application, where you can simply make the change and do a release, it’s considerably more painful where independent services must synchronize. The coordination between teams and release cycles erodes agility. Establishing clear responsibilities and boundaries is a challenge.
  5. The typical approach is to simply avoid such pesky, crosscutting changes by cleanly separating responsibilities between services. A Single Sign-On service is a good example. It has a well-defined role which can be cleanly separated from the roles other services play. This clean separation means that, even in the face of rapid requirement churn in surrounding services, it’s unlikely the SSO service will need to change. It exists in a tightly bounded context.
  6. Service APIs can also deteriorate into a pseudo-tightly coupling between these services as well if not designed properly. If Course service now needs to get AllAuthors how do we do it? Really bad to transfer data at any scale
  7. These two forces are fundamental. They underlie much of what we do, fighting for supremacy in the systems we build. As we evolve and grow service-based systems, we see the effects of this Data Dichotomy play out in a couple of different ways. Either a service interface will grow, exposing an increasing set of functions, to the point it starts to look like some form of kookie, homegrown database. Alternatively, frustration will kick in and we add some way of extracting and moving whole datasets, en masse, from service to service.
  8. Especially as services evolve
  9. We operate in ecosystems: groups of applications and services which together work towards some higher level goal. Making these systems event-driven has several advantages. First, we can rethink our services not simply as a mesh of remote requests and responses, but as a cascade of notifications, decoupling each event source from its consequences. Second, we can realize that events are themselves facts: a narrative that not only describes the evolution of the business over time, it also represents datasets —orders, payments, customers, etc. We can store these facts in the very infrastructure we use to broadcast them, linking applications and services together with a central data-plane that holds shared datasets and keeps services in sync.
  10. Data queries don’t cross service boundaries and each bounded context has its own local database copy.
  11. Responsibility for propagating events of a specific type to a single service. So the Stock Service would own how the ‘Inventory of Stock’ progresses forward over time, etc. This helps funnel consistency, validation and other ‘write path’ concerns through a single code path (but not a single process). Assigning responsibility for event propagation is important because these aren’t just ephemeral chit-chat. They represent shared facts, the data-on-the-outside. As such, services need to take responsibility for curating these shared datasets over time: fixing errors, handling situations where schemas change etc.
  12. Batched, sequential operations help with overall performance. They also make the system well suited to storing messages longer term. The log is O(1) when either reading or writing messages to a partition, so whether the data is on disk or cached in memory matters far less. Most traditional message brokers are built using index structures, hash tables or B-trees, used to manage acknowledgements, filter message headers, and remove messages when they have been read. But the downside is that these indexes must be maintained. This comes at a cost. They must be kept in memory to get good performance, limiting retention significantly.
  13. Kafka was shaped by a different set of forces. A problem-space rethink, optimizing for scale and fault tolerance. It is simple. Its scale comes from simplicity. From workloads that flow with the direction of the underlying hardware. Kafka’s represents this.
  14. So the log is an efficient data structure for messaging workloads. But Kafka is really many logs. It’s a collection of files, filled with messages, spanning many different machines. Much of Kafka’s code involves tying these various logs together, routing messages from producers to consumers reliably, replicating for fault tolerance and handling failure.
  15. Producers spread messages across different partitions The main advantage of this, from an architectural perspective, is that it takes the issue of scalability off the table. With Kafka, hitting a scalability wall is virtually impossible in the context of business systems. This can be quite empowering, especially when ecosystems grow, allowing implementers to pick patterns that are a little more footloose with bandwidth and data movement.
  16. An offset is a special, monotonically increasing number that denominates the position of a certain record in the partition it is in. It provides a natural ordering of records—you know that the record with offset 100 came after the record with offset 99. parallelize a topic by splitting the data in a particular topic across multiple brokers 
  17.  Say a customer makes several updates to their Customer Information. The order in which these are processed is going to matter, or else the latest change might be overwritten with an older value. The second use case we need to be careful of is retries. In almost all cases we want to enable retries in the producer, so that if there is some network glitch, long running GC, failure, etc, any messages that fail to be sent to the broker will be retried. The subtlety is that messages are sent in batches, so we should be careful to send these batches one at a time, per destination broker, so there is no potential for a reordering of events when failures happen and whole batches are retried. This is simply something we configure (max.inflight.requests.per.connection).
  18. To make best use of replication, for sensitive datasets like those seen in service-based applications, configure three replicas per partition, set acknowledgements to “all” and min.insync.replicas to 2. This ensures that data is always written to at least two replicas, but Kafka will continue to accept messages should you lose a machine.
  19. Highly sensitive use cases may require that data be flushed to disk synchronously. This can be done by setting log.flush.interval.messages = 1 — but this configuration should be used sparingly. It will have a significant impact on throughput, particularly in highly concurrent environments. If you do take this approach, increase the producer batch size, to increase the effectiveness of each disk flush on the broker (batches of messages are flushed together).
  20. By default, topics in Kafka are retention based: messages are retained for some configurable amount of time.  Kafka also ships with a special type of topic that manages keyed datasets: that’s to say data that has a primary key (identifier) and you might put in, or get from, a database table. 
  21. Because of things like compaction, Kafka can be used as a long-term storage layer. Some use cases that support this are event sourcing, in-memory caches (with compacted topics), stream processing or change data capture.
  22. Idea is data quality, discovery and curation. Datasets are able to describe themselves.
  23. Kafka also ships with a special type of topic that manages keyed datasets: that’s to say data that has a primary key (identifier) and you might put in, or get from, a database table. 
  24. Atomicity -  Guarantees that each transaction is treated as a single "unit", which either succeeds completely, or fails completely: Consistency - a transaction can only bring the database from one valid state to another, maintaining database invariants (constraints, triggers, etc.) Isolation-  Ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially.  Durability - Once a transaction has been committed, it will remain committed even in the case of a system failure
  25. Associativity - the order of function execution doesn't matter. It means that in the following case both executions are equal: f(a,f(b,c)) = f(f(a,b),c).  This is the idea behind compaction and patches. Commutative -  this property is similar to the previous one except it's related to the function arguments. It means that these arguments are order-insensitive, so f(a, b) = f(b, a). Idempotent -  the idempotency means that the function always, even if it's called multiple times, returns the same result, such as: f(f(f(x))) = f(x Distributed - its meaning is mainly symbolical. It highlights the fact that the ACID 2.0 applies on the distributed systems.). 
  26. Views can represent a subset of the data contained in a table. Consequently, a view can limit the degree of exposure of the underlying tables to the outer world: a given user may have permission to query the view, while denied access to the rest of the base table. Views can join and simplify multiple tables into a single virtual table. Views can act as aggregated tables, where the database engine aggregates data (sum, average, etc.) and presents the calculated results as part of the data. Views can hide the complexity of data. For example, a view could appear as Sales2000 or Sales2001, transparently partitioning the actual underlying table. Views take very little space to store; the database contains only the definition of a view, not a copy of all the data that it presents. Depending on the SQL engine used, views can provide extra security.
  27. Views can represent a subset of the data contained in a table. Consequently, a view can limit the degree of exposure of the underlying tables to the outer world: a given user may have permission to query the view, while denied access to the rest of the base table. Views can join and simplify multiple tables into a single virtual table. Views can act as aggregated tables, where the database engine aggregates data (sum, average, etc.) and presents the calculated results as part of the data. Views can hide the complexity of data. For example, a view could appear as Sales2000 or Sales2001, transparently partitioning the actual underlying table. Views take very little space to store; the database contains only the definition of a view, not a copy of all the data that it presents. Depending on the SQL engine used, views can provide extra security.
  28. In the old days of monoliths, this didn’t matter. There were no nasty side effects, because when you released an application you released the database too, so it didn’t matter if the application was tightly coupled to the database or not: they both changed together.
  29. We create independent, data-enabled components that can be composed in parallel or embedded within your services.
  30. But unbundling goes one step further, when applied in a microservices context. It provides services with a kind of ‘database’ that they can share, one that avoids both an amplifying interface as well as shared mutable state. Being decentralised means the views can be placed anywhere. They might be a standalone entity. They might be embedded inside a service. The view can take any form you want, and can be regenerated on a whim
  31. The implications of this change in workflow go a little deeper still. Services are encouraged to take only the data they need, at a point in time. This keeps the views small and lightweight. It also reduces coupling. We will dive into this consequence more fully in the next post.
  32. We can store these facts in the very infrastructure we use to broadcast them, linking applications and services together with a central data-plane that holds shared datasets and keeps services in sync.
  33. Data-centric microservices
  34. Data-centric microservices What’s missing?
  35. The implications of this change in workflow go a little deeper still. Services are encouraged to take only the data they need, at a point in time. This keeps the views small and lightweight. It also reduces coupling. We will dive into this consequence more fully in the next post.
  36. There could be many different shapes of the same data: a key-value store, a full-text search index, a graph index, an analytics system, and so on.