SlideShare una empresa de Scribd logo
1 de 150
Descargar para leer sin conexión
I’d like to start with a story

A few months back I was at work, and received a twitter notification
@svpember
A guy I know tweeted this at me:
@svpember
Oliver Gierke, who’s the Spring Data lead at Pivotal, sent out this tweet.

<read>

So I, uh, rather enthusiastically responded…
@svpember
by deluging the poor guy with tweets until he responded with ‘heyyy this has been helpful, thanks! *Upside-down smiley face*”

Other people jumped on the conversation too, of course, but I think I was the most. ah. enthusiastic.
@svpember
One of the other participants tweeted this: “maybe you should make a blog post?”

Which is wonderful, right. Validation from strangers on the internet!

I thought better yet, I’ll make a talk.
Event Storage in a
Distributed System
Steve Pember
CTO, ThirdChannel
steve@thirdchannel.com
Software Architecture Conf 2018: NYC
@svpember
So I wrote it up, submitted it, and here we are.

My name is Steve. I work for a company called ThirdChannel, out of Boston.

Now, I Realized that I accidentally hit the 90 minute checkbox when submitting this talk, so, ah, the scope of this a little bit more involved than the title
initially suggests.

Let’s talk about Events
It’s all about Events
This talk is all about events. You’re going to be sick of hearing the word by the end of this presentation.

However, bear with me. it is important.

I feel that systems that are not representing transitions within themselves as events, and are not actively listening to or taking advantage of these internal
events… not being quote / unquote ‘reactive’… are missing out on huge advantages in flexibility, reporting, and scalability both in terms of deployments
and operational / developmental scalability.

Now, after that bold statement… well, we’ll get into all of that, but first we should discuss the most fundamental question…
What is an Event?
What is an event? 

Does anyone want to take a stab at defining what we should consider an event?
@svpember
Event
• Something that has occurred in the past within some software
• Intended for consumption by other software
• Distribution is Often asynchronous
• Often contains data detailing the event
• Immutable
a Well, I would classify it as…

a piece of data that that signifies some action has been performed in the past within some software

two most important bits are in the past and immutable
@svpember
So, events like “Order placed”…

Are all great. They denote that something has happened.

Immutability is another important part. Let’s say ‘ItemsShippedEvent’ was emitted with a value of 5. It would potentially be disastrous for a something to
later change that value to say, 1000, right? Would disrupt all meaning
–Martin Fowler “Domain Event”
https://martinfowler.com/eaaDev/DomainEvent.html
Things happen. Not all of them are interesting,
some may be worth recording but don’t provoke
a reaction. The most interesting ones cause a
reaction. Many systems need to react to
interesting events. Often you need to know why a
system reacts in the way it did.
Because it wouldn’t be an Architecture talk without a Fowler quote…

Another way to think of events.. which frames most of this discussion, is a quote from Martin Fowler:

don’t read!

Basically: “Important events cause reactions elsewhere in the system, and it’s often important to understand why those reactions occurred”.
As an aside… Reacting to events may be nothing new to Javascript or frontend developers

	 •	 Your browser’s DOM, Javascript, and I suppose UIs in general are full of events. Literally anything you do on the browser generates an event.
Move the mouse, click a box, type a letter, let go of a letter, etc.

	 •	 While the knowledge of this talk is transferable to the frontend to some extent, the majority of this talk is focused on the server side.

Server side doesn’t traditionally deal with lot of events, I’d say. particularly if you started your dev life and career with big frameworks like rails, grails,
django, etc
It’s not hard to do, to program in terms of events

Generally, one or more event are created when a user successfully performs an action or Command. (slide of various event names, will reuse this aesthetic
later). They represent successful deltas or actions that have occurred in the past

Now, here, we have code which accepts some incoming Command object to create a new TodoList, validates it, and generates two events based on that
command, saves the events, saves a projection of the current state of the TodoList entity (though this is optional, as I could recreate the state of the todo
list entirely from the events), and transmits them.
My domain objects or ‘Entities’, start becoming highly functional as they acquire 

methods to manipulate them by applying events.

This is certainly not production code, but you can see how my entities start acquiring handlers on themselves that when provided various events know how
to use that event to update their internal state. In production, I’d probably have the entity use an internal mutable builder, and the builder receive the
events, which would then spit out a validated, immutable Entity. but Alas, this is an example.
@svpember
A slide back I mentioned that Events need to be transmitted. Well, these events need to be seen by others to be useful. It’s one thing to have my entity only
see its events, but it’s entirely another thing to share.. and mix events from across the system.

And so, Need to have some method of transmitting these events to other interested parties

Both Internal and External

For internal, this typically involves some asynchronous publish/subscribe mechanism. Tools that I’ve used successfully for this purpose have included a
library called Project Reactor, and using Reactive Streams via the rxJava library
@svpember
Externally, these events can be transmitted either via point to point http or via some asynchronous message queue… which as we’ll see later is my
preferred method.
At this point you, might be saying.. ok, cool

but why…

That’s one point of criticism I often get for my talks. I mention all these great things I’m working with but neglect to really hammer home the ‘WHY’ of it all.

So.. why? why should you care about any of this? so far it just looks like I’m adding a bunch of extra hassle for you.
The reason is that events, these smaaaaallll bits of information are collectively extremely powerful.

The point of this presentation, the synopsis, the end goal, is to try and show you that tracking events, persisting them, and treating them as first-class
citizens within your system is a wise idea with loads of potential benefit. AND that there are some caveats to be aware of when we talk about how to store
these events and work with them within a distributed environment.

However, there are some steps in the way to get there.
@svpember
Overview
• Event-Oriented Distributed System Architecture
Today, we’re going to discuss some information that I’ve broken down into the following topics:

<read> The architectural background of this talk. This is an architectural conference, and it’s important. This will cover some concepts and architectural
designs to help prepare your systems to think in terms of events.

And because we’re architects, we’ll probably have some boxes and lines drawn up on the screen, because it wouldn’t be an architectural presentation
without good ol boxes and lines
@svpember
Overview
• Event-Oriented Distributed System Architecture
• The Power of Events
after we get in the mindset of working with events and architecting our systems to operate in an event-first fashion, we’ll look into why you should be
excited about having events laying around as first-class citizens within your app. I do this topic second, as I think the architecture portion is the harder pill
to swallow… plus I think that getting your head around the existence of events makes it easier to start to see their usefulness. Although I could be wrong,
let’s give it a go.
@svpember
Overview
• Event-Oriented Distributed System Architecture
• The Power of Events
• Event Storage & Lifecycle
After that, we’ll discuss the impact of storing events, different patterns for doing so, and some lifecycle concerns for services in an event based
environment
@svpember
Overview
• Event-Oriented Distributed System Architecture
• The Power of Events
• Event Storage & Lifecycle
• Day to Day Concerns / Working with Events
And then finally, some additional details of working with events that don’t necessarily have to do with storing them.

Alright… let’s begin
Let’s start with Microservices
First, a few minutes on Microservices
I’ve borrowed this slide before. Thanks Alvaro!

I like it because it’s honest. You start with a mess and if you’re not careful you end up with a distributed mess

Anyway…

	 •	 How many of you have attended SACon before? This is, I think, my third or fourth time over 4 years. I’m pretty sure the first two years were
almost exclusively talks about microservices. I know I contributed to it, eh?

	 And there’s a good reason

	 This notion of Microservices has been a great transformational thing in software development and architecture. Even if you think it’s a rehash of SOA,
it still has been promoting the virtues and popularity of distributed systems with the larger community… which I think is a good thing.

	 •	 Now… Just to get a general poll… who’s working with them?

	 And how here likes working with them!

Any hands go down.. aw, some jaded folks
@svpember
The Promise of Microservices
• Reduced complexity per service
• Easier for developers to understand a single service
• Teams work with more autonomy
• Independent scaling, deployments, builds
• Fault isolation
• “Right tool for the job”
• Isolation and decoupling
• Continuous Delivery and Deployment
The promise of Microservices is vey alluring, yeah?

Right out of the gate, we immediately reduce the complexity of our codebase by making it several smaller codebases

My favorite: It allows teams to work in great autonomy, with improved isolation and decoupling. 

It allows for independent scaling of services.

The most powerful is that micro services Allows for continuous Delivery and Continuous Deployment. Which honestly is… I think the pinnacle of efficiency
a software dev team should be striving for.

Now I’m being a bit flippant about that because we should be concerned about testing and regressions of our releases, of course… but all of this is another
topic entirely.
@svpember
… Some Caveats
• Vastly increased infrastructure complexity
• So much Ops
• Teams need to handle all lifecycle steps of service deployment
• Conceptual difficulty with multiple service deployments
• Potential performance hits for intra-service comms
As useful and as powerful as all of that is… there are absolutely some tradeoffs when using microservices.

You immediately… IMMEDIATELY have increased complexity in your infrastructure.

And going back to my point that micro services have been good for growing awareness within the community… the rise of tooling in this space has just
been insane. Kubernetes, Hashicorp’s entire business model… it’s great stuff

The point here is that if your team isn’t ready to shift the complexity of the codebase into infrastructural and ops complexity, you should probably hold off.
A few things bothered me…
That being said, I still think that the Microservice approach is very useful.

However, as it’s been growing, three points have always bothered me that I never felt were fully discussed or I discussed agreed in the presentations I’ve
seen and material I’ve read.

Just as a preview… did everyone see the keynotes yesterday morning? Cornelia Davis started out her presentation listing issues with distributed systems
and it spoke to my soul man. She basically gave this talk already.
@svpember
Questions about Microservices
• How should they communicate?
First: how should these services communicate?
@svpember
You see, when the term ‘microservice’ became the rage, it was my observation that people were building services which utilized point to point synchronous
http comms to query, post, etc data between services. There’d be service discovery systems in play necessary to make other services aware of each
others existence. 

These synchronous calls utilize resources (e.g. threads), block, take time… and if a service goes down, what happens if another service is reliant upon it?
@svpember
slide of multiple services being needed to support a single hop

And I’m aware of netflix’s hystrix and other circuit breaker technologies to help with all of this, but still seems a lot could go wrong in that chain.
Time to Go Reactive
To address point 1, I suggest embracing a design pattern known as ‘Reactive’
@svpember
Reactive Systems
• Communication between services driven by asynchronous events
Has anyone here heard of the ‘Reactive Manifesto’?

I’m big fan of it, but going to bit of a spin on its tenants to fit my narrative here.

Anyway, when say ‘Reactive’, we don’t mean ‘React.js’ or Reactive Streams (Though I love both of those things)

It’s a design philosophy to apply to systems to help them achieve high scalability.

The first rule for Reactive Systems is that…

On a positive note, Based on what I’ve been seeing over the past year or so, the collective opinion is moving away from entirely http to be more event
driven, which is great. There are, for example, several talks at this conference on this very subject.
@svpember
Reactive Systems
• Communication between services done by asynchronous events
• Services ‘React’ to observed events
Anyway, point two!
@svpember
Reactive Systems
• Communication between services done by asynchronous events
• Services ‘React’ to observed events
• Use some Message Broker technology to promote Async and reduce
Data loss
@svpember
Reactive Systems
• Communication between services done by asynchronous events
• Services ‘React’ to observed events
• Use some Message Broker technology to promote Async and reduce Data
loss
• Synchronous HTTP calls between services kept to a minimum
by reducing the # of synchronous calls, we gain two main benefits:

- less resource contention on the thread pools of each of our services

- Firewall like effect: if services die, the they don’t cause other systems to fail or have to rely on fallback circuit breaker code

- can reduce the number of calls, by collecting data from other services

One side effect of this and the previous points is that your platform should become quite fast, as well as have… <next>
@svpember
Reactive Systems
• Communication between services done by asynchronous events
• Services ‘React’ to observed events
• Use some Message Broker technology to promote Async and reduce Data
loss
• Synchronous HTTP calls between services kept to a minimum
• Resiliency against failing services
- By reducing and eliminating the need to runtime dependency on other services, each service can function in isolation and will not be brought down by
failing services.

And that’s my basic overview of Reactive systems. Everyone with me so far? No. too bad, let’s keep going.
@svpember
Questions about Microservices
• How should they communicate?
• How “large” should a micro service be?
• How much responsibility should a single service have?
Now, the next two…

<read>

I feel can be solved or addressed by an architectural design pattern known as…<next>
Domain Driven Design
Command Query Responsibility
Segregation
Domain Driven Design (or DDD) and a related variant called Command Query-Responsibility Segregation (aka CQRS)

Are two architectural patterns intended for very complex systems. These are not trivial things to understand and you should take care if you plan to adopt
them. They have great power, though
- Eric Evans
“Some objects are not defined primarily by their
attributes. They represent a thread of identity
that runs through time and often across
distinct representations. Sometimes such an
object must be matched with another object
even though attributes differ. An object must be
distinguished from other objects even though
they might have the same attributes.”
One of the most interesting parts of DDD, one that really stuck with me, is this quote: <read quote>

- that’s interesting, yeah? 

               If I change my name, am I no longer me? Of course not, I’m defined by more than my name. If I change my email address, or my address, or my
social security number for some reason, am I no longer me? Obviously not… Can your database understand identity changes like this and still be able to
find the original object?
@svpember
Domain Driven Design
• Ubiquitous Language
DDD has several interesting ideas besides that quote of course. When building a system adhering to DDD, it offers several guidelines.

The first: Ubiquitous Language.

Everyone in the company should be speaking in the same terms. The same concepts. Your classes and objects should reflect the Language.

Marketing should be using the same terms as Sales and as Engineering. When Engineering builds a new Feature or new Service the entire company should
be using it. If you’re an e-commerce app and product management decides to name the Product Catalog… ah… ‘Zephyr’ for some reason, Engineering is
also calling it Zephyr, and there better be a Zephyr.java file somewhere in the repo.
@svpember
Domain Driven Design
• Ubiquitous Language
• Entities / Value Objects
Entity Objects are those that you truly care about tracking. These objects will have specific identifiers and you will take great care in maintaining their
individuality and their relationship to other objects.
Value Objects: when you know you have a lot of something, but you don’t care about the identity of each individual item.

The example that I believe is given in the DDD book is that a automobile Engine might be an Entity (it has a unique serial number that mechanics might car
to track), while the wheels… well, the car has 4, and I don’t necessarily assign any uniqueness to them
@svpember
Domain Driven Design
• Ubiquitous Language
• Entities / Value Objects
• Aggregates
Third term: Aggregates
Group of Entities With One Root
… the aggregate Root. the root acts the parent or entry point when referencing the aggregate, which leads to the next point:
@svpember
Domain Driven Design
• Ubiquitous Language
• Entities / Value Objects
• Aggregates
• Bounded Contexts
powerful concept

logical grouping of related functionality
@svpember
The key point here is that Objects inside an Aggregate may hold references to other Roots. But only the root id. They may not hold any information about
entities below the root within that context.
Combine and Isolate related
objects into Modules
It’s a natural step then in your code to ensure that you combine and isolate all related objects within a context into one module.
No Direct comms across
boundaries
And, no direct communications are allowed across context or module boundaries.

Well if that’s true, what do I do if I need information from across boundaries?
@svpember
Domain Driven Design
• Ubiquitous Language
• Entities / Value Objects
• Aggregates
• Bounded Contexts
• Domain Events
@svpember
Here we have several of our Modules - whose names mirror our Ubiquitous Language, btw. They cannot talk to each other directly, but rather through some
intermediary mechanism. Now, it could be direct message passing, but this differs from importing and calling methods directly in the module. 

You could use a Pub Sub Mechanism, or some sort of message broker… etc, etc.

The important bit is that the modules are bounded away from each other.
Events Are Transactionally Safe
That is, no Events are emitted until an item is successfully saved to disk. The event is part of the transaction.
@svpember
Domain Driven Design
• Ubiquitous Language
• Entities / Value Objects
• Aggregates
• Bounded Contexts
• Domain Events
• One Last Takeaway…
Wait… what if I need to share
across contexts?
This may not be phrased correctly , but the answer to this is one of the most powerful aspects of DDD and one of the hardest to get used to.

Bounded contexts are isolated, autonomous components with their own entities, classes, service objects, etc…. However, all the bounded contexts exist
within the same system, and certain concepts or Entities will likely exist throughout the entirety of a system… although each context may only care about a
subset of the info about that entity. 

Or, another way to phrase it: each bounded context is only concerned with some subset of an Entity within a system, and no context will know the entire
set of information about an entity. This separation is the concern of the context’s boundary
@svpember
- For example!

The catalog context knows how much inventory is left for this particular SKU, but the shopping cart and the admin context don’t necessarily need the
information. The inventory count may be a function of a ‘Warehousing context’ that the catalog receives events for.

Similarly, the Shopping Cart context contains the quantity, the # of this SKU that the user wants to purchase. That information has no bearing on the
catalog context.

This concept - that an entity can exist in multiple contexts, though each context is only concerned with a subset of that information - is very powerful, and
very useful.

Understanding which information belongs in which context… and maintaining that decoupling, is however, one of the toughest aspects of DDD, and can be
a challenge for younger developers or those newer to ddd to grasp.

For example, we recently had a situation…
And now, CQRS
Command Query Responsibility Segregation is an evolution off of DDD, that calls for changes in how one accepts and sends data
MVC
With your standard MVC/ CRUD style approach that you get out of the box with many big frameworks, the pattern generally is something like the following:

- user makes a change via the ui, let’s say to a Product object

- There’ll likely be a Product Controller which takes that data

- passes it to a Product Model

- which in turn gets validated and saved to the database, likely in a table named ‘product’

When the user wants to retrieve information about the product, the same objects are used. Query for product with id X, product controller uses a product
model to retrieve data from the product row, then passes the retrieved model up to the ui
CQRS
CQRS says: why use the same objects for every task? It makes a hard distinction between modifying actions the user is attempting to do - which it calls
‘Commands’ - and data retrieval tasks - aka Queries - , and thus breaks up the underlying code to enforce that distinction.

So, If a user wants to make a change to some data, say again, a Product, he manipulates the change in the ui, maybe clicks a button, and a relevant
controller converts that request into a ProductChangeCommand object or model, which contains details on what the user is trying to change. That
command is then validated, then the changes are persisted AND, while not shown here, domain events are emitted. 

As for Querying, a User dictates the query, it’s packaged by the query controller into a query object, results are pulled from the database, and the response
is returned to the user.

While it reads similar it’s important to understand that the models are different, and often advantageously so. The Product information I display to the end
user may be only a subset of the Product model / entity object, so I only pull a few fields from the db. Or, my query object represents a composite of
several models in one multi faceted report that is pulled in one query.
CQRS
Following that line of thinking, we can extends this a bit further.

We can isolate our writes and our reads into separate contexts or services. Besides the nice decoupling, it allows us to get creative in other areas:

- want to scale out our write capability vs our reads? no problem. just scale up one of those services

- if our write service emits domain events when it saves, could our query service listen for other domains’ events’? sure!

- want to have custom query reports that pull from multiple domains? no problem! You can build multiple query models that are highly targeted towards
what ever end user report or experience you’re trying to deliver.
Allows for interesting
Architectures
Continuing that line of thinking, it allows for some very interesting Architectures
This a graphic taken from Udi Dahan’s website, he’s another pioneer in the CQRS space.

What this diagram is trying to depict similar to what I’ve been describing.

The blocks labelled ‘AC’ stand for autonomous component, I think. Think of them like a service.

So, in the bottom left, the user enters a command to the first service. it succeeds and the changes are written to local storage. Events are published and
retrieved by one or many other services, who update their local query caches based on those events. Then, when the user performs a query, it hits that
highly targeted query cache, giving the user the intended results with a minimum of sql queries or joins.
Ok then, still seems like a lot of trouble… so why.. why go through all this trouble? 
@svpember
Reactive + CQRS
• System is ideal for both write- and read-heavy applications
	 •	 writes are contained for a particular service / bounded context

	 •	 scale up services receiving writes

	 •	 create query caches specifically designed to handle queries

	 •	 can scale those up too

	 •	 Efficient Querying -> just going to highlight this again

	 •	 One note: due to the distributed - and as we’ll see soon event based storage, if your company has analysts they will likely hate you. You’ll likely
need to build a service or query store solely for them to run sql queries against.
@svpember
Reactive + CQRS
• System is ideal for both write- and read-heavy applications
• Service Design via Bounded Contexts
Bounded Contexts are an excellent tool to determine Microservice responsibility and potentially sizing of a service. i.e. how big should it be? when do we
create a new one?

We went through a team exercise where we tried to figure this out… 

- Each circle represents a context boundary

- big outer circle is 3c itself

- four big inner circles are different functional areas of our company

- small inner inner circles represent contexts further still

What we found was that our services for the most part mapped to the smaller circles, which was great. although there was much duplication (e.g. services
belonging to multiple contexts) and some we identified that should be combined. those are both bad.
@svpember
Reactive + CQRS
• System is ideal for both write- and read-heavy applications
• Service Design via Bounded Contexts
• Failure Protection
@svpember
	 •	 For a sufficiently large system, something is always going to be in a failure state

	 •	 reducing or eliminating calls between services when handling a Command or a Query eliminates the dependency on that or those additional
services

	 •	 Service being queried will still function

	 •	 One of the tenants of the ‘Reactive Manifesto’, provides a stopgap for failures impacting the user.

	 •	 For example, if the Shopping Cart Management service is down, my product catalog service should not be affected and the user should still be
able to browse the catalog

	 •	 Additionally, using a durable Message Broker for communication grants additional layers of protection. We use RabbitMq, but loads of folks
have great success with tools like Kafka. These tools will hold on to messages, allowing consuming services to consume them at their leisure. This has
advantages in situations were a service is down for a period of time. Or, imagine if a service cannot handle a message the devs need to fix it, the message
waits on the queue until the service is back online resulting in no data loss.

image - broken down service, happy product catalog
@svpember
Reactive + CQRS
• System is ideal for both write- and read-heavy applications
• Service Design via Bounded Contexts
• Failure Protection
• Promotes Async comms
	 •	 eliminating or reducing calls between services when handling Commands and Queries also eliminates blocking, synchronous calls to these
services

	 •	 reduces resource contention on thread pools

	 •	 Also eliminates a potential failure vector: services ‘backing up’ with chains of communications
@svpember
Reactive + CQRS
• System is ideal for both write- and read-heavy applications
• Service Design via Bounded Contexts
• Failure Protection
• Promotes Async comms
• “Simple” Testing
	 •	 I put ‘simple’ in quotes there because no matter what I say next… this is still a distributed system we’re talking about

	 -	 Each service can be heavily unit and integration tested in isolation of each other

	 •	 Testing of the platform as a whole is important and useful. At least for us, a good chunk if not most of our bugs is contract violations in the
incoming commands and events (e.g. I thought event A was emitting two fields but it’s actually three and I didn’t have JSON ignore properties set in
Jackson, or a service misspelled a variable name, or someone changed event A’s fields.)

	 •	 To me, this means that our organization is lacking communication and is violating the CAP theorem, or it’s being proven true, or however you
want to say it. The point here is to make sure that other teams are aware of the shape of events emitted from your services.
@svpember
Reactive + CQRS
• System is ideal for both write- and read-heavy applications
• Service Design via Bounded Contexts
• Failure Protection
• Promotes Async comms
• “Simple” Testing
• Reduces cross-service Querying
One of the most important reason you do something like this? To avoid having to write queries that would otherwise span multiple services. Has anyone
had to write queries that span multiple services? It’s a nightmare. so inefficient.

So, while this approach may feel like you’re duplicating data or whatnot, it’s quite efficient
@svpember
Reactive + CQRS
• System is ideal for both write- and read-heavy applications
• Service Design via Bounded Contexts
• Failure Protection
• Promotes Async comms
• “Simple” Testing
• Reduces cross-service Querying
• Also…
And it also leads to one of my absolute favorite concepts…
EVENT
SOURCING!
@svpember
Event Sourcing
• Alternate Storage Pattern -> Prevents Data Loss
@svpember
More specifically, it’s an alternative to your standard ORM storage mapping, 

Where an object in memory maps directly to a row in a database, even if that row may be split via joins

* update is made to a model, updates a column in your database

* in this method, the only thing you know about your data is what it looks like right now.
@svpember
Event Sourcing
• Alternate Storage Pattern -> Prevents Data Loss
• Store Deltas
Store Deltas, not Current State
@svpember
This stream of events are persisted in our database in the order they occurred, as a journal of what has transpired.

These events can then be played back against our Domain Object, building it up to the state it would be at any given point in time, although this is most
likely the Current State
@svpember
Event Sourcing
• Alternate Storage Pattern -> Prevents Data Loss
• Store Deltas
• Additive Only
• Time Series
Never Delete, Never Update
It means you only ever insert data into your database. No event rows are ever, ever, ever updated or deleted. In so doing you’ve now turned your events
table into an append only journal, which is every efficient for most databases to do. 

In other words, events are immutable!
@svpember
Event Sourcing
• Alternate Storage Pattern -> Prevents Data Loss
• Store Deltas
• Additive Only
• Time Series
• It’s Old
Rather, this basic idea of storing deltas, rather than just current state has been around for a long time
- Every transaction you make with your bank. Every Credit or Debit made is logged, along with an audit trail of who (e.g. which teller) made the change. 

- To get your balance, your bank simply adds up each of these transactions

- May also periodically record what the balance was at specific points in time, to prevent having to recalculate everything from the beginning of time. 

- Can you imagine if you checked your bank statement and it could only show you your current balance… not how you reached that number?
Lawyers!

If a contract needs to be adjusted, is the contract thrown out and re-written? No. Rather, ‘addendums’ are placed on the contract.

To figure out the contract actually involves, one has to read the initial contract, and then each successful addendum in order to figure out what the thing
actually says.
@svpember
Event Sourcing
• Alternate Storage Pattern -> Prevents Data Loss
• Store Deltas
• Additive Only
• Time Series
• It’s Old
• Easy To Implement
promotes a highly functional style, that’s very easy to unit test

if I need to get the current state of an entity, it’s as simple as select * from event where id = x, then pass each event into an entity class, which will build it
up to current state by using the functions outlined here
@svpember
Event Sourcing
• Alternate Storage Pattern -> Prevents Data Loss
• Store Deltas
• Additive Only
• Time Series
• It’s Old
• Easy To Implement
• … Difficult to Grasp
Now, this is where ES may start to hurt your brain
All Entities are Transient
Derivatives of the Event Stream
Objects are backed - ‘sourced’ -
by events
Which is just a fancy way of saying: All Objects are ‘backed’ or ‘sourced’ by various events from the Journal or Event Stream
@svpember
Now this has lots of powerful uses which we’ll get into in a bit, but regardless…
Can be
difficult for
Junior
Engineers
I’ve found that this entire concept can be a bit difficult for junior developers to grasp at first. So be aware of that. Internal Education can help dramatically.
But, why?
I’m sure that I’m making this all sound very attractive. You’re again probably asking yourself… ok, great… but why?

Ok, so why?
@svpember
Event Sourcing: Why
• Append-Only
@svpember
Event Sourcing: Why
• Append-Only
• Prevents Data Loss
Never Delete!
With Event Sourcing, no events are EVER deleted or updated, a nice side effect of the Append only nature
@svpember
Event Sourcing: Why
• Append-Only
• Prevents Data Loss
• Time Travel
@svpember
Event Sourcing: Why
• Append-Only
• Prevents Data Loss
• Time Travel
• Perfect For Time Series
@svpember
Event Sourcing: Why
• Append-Only
• Prevents Data Loss
• Time Travel
• Perfect For Time Series
• Automatic Audit Log ++
Built in, Automatic Audit Log for your Entities
Audit Logs tell the History
Events tell the Intent of History
Furthermore, Having the Events as a first-order member of your platform can give you enhanced information around what your users or systems are doing
beyond what might normally get written to the database. 

Can make events that don’t necessarily deal with the properties changed by a user, but additional actions that may have occurred

And it’s easier to work with and analyze the data if the events are integrated within your platform already.
@svpember
One trivial example is this. 

One of first ES systems was an Internal User Management system, where our Program Managers (don’t worry about these terms) track prospective
ThirdChannel employees, which we call agents.

Our Managers wanted a way to get history for each potential agent, and because of Event Sourcing, it was about 5 minutes of work to display the history of
each agent like that.
@svpember
Event Sourcing: Why
• Append-Only
• Prevents Data Loss
• Time Travel
• Perfect For Time Series
• Automatic Audit Log ++
• Data Mining and Reporting
@svpember
And so you may be thinking, ok great: I can get all the relevant events for Bob’s shopping cart, but I’m just only ever going to play them all to get his current
state!

And for much of the time, yes, users definitely want to know what their current state is.

The magic comes though with the business value. Business loves time series data
@svpember
Reporting in Event Temporality
• Look at ALL users Product Removed events: which products are being
discarded?
• Find ALL ProductAdded + ProductRemoved event pairings (i.e. the same
product, user) that occur with 5 minutes: perhaps a user is hesitant on
purchasing… maybe offer them a discount?
• Find avg time between ProductAdded and OrderPlaced: how long are users
sitting with non-empty carts?
• Find Trends in all of the above
Because you have access to the all data changes that have happened along with the time relationship… you can get extremely creative..

Now, I’m very creative, but here’s a few ideas for reports on just users’ shopping cart events…
Business
Types love
Reports
I assure you, if you start showing off or even hinting to your product research teams, your product owner teams, etc that these capabilities could exist
within your platform… they will get very excited
Collecting and applying your events like this is known as a ‘projection’. You’re taking events from your various event streams and projecting them into or
onto some data structure. 

You can also think of it as a Materialized View.

maybe just an image and talk about updating projections
Is there a performance hit?
Save ALL the events?
So, that all being said..

Two questions I generally receive when talking about this…<read>

The answer to the first is yes, there is a little performance hit if you play events each time during a user request. However, we’ve found that it’s fairly
minimal for entities with fewer than a hundred or so events on them. It’s one SQL query to get all events for an entity to see current state, just as it would be
with an ORM. However, we typically tend to follow the CQRS methodology and build current state projections for run time querying that users see.

Now, for the answer to the second, let’s jump to the next section….
Questions?
Questions before I proceed?
Event Storage
Ok, With that all in mind, let’s proceed to the next section, Event Storage… let’s talk about how we store all these events.

First, I want to address the two questions I left on.
First, in an event oriented world, you’re going to acquire quite a few events. Going to seem pretty messy after a while. You’ll end up with perhaps larger
database sizes than you originally thought, though, remember this is largely due to the fact that you’re now tracking a third dimension of time within your
data store

And really… data storage is CHEAP. most of us are not netflix or facebook and the scale of events we’ll be working with is very manageable.
Now, of course, if this is at all bothersome, you can adopt a compaction strategy. The most well known is SnapShotting, where you compress related
events older than say, two years into one object, then extract the raw events and put them into a cheaper long term storage like AWS glacier. Still never
throw them away, though.
@svpember
You can also use Snapshotting more frequently as a mechanism to alleviate performance troubles, too.

Make a snapshot on some interval, say every week.. or every 100 events. You load the snapshot first, then find all events since the snapshot was taken.
The issue here is that this adds an additional query to fetch the snapshot, so it’s only worthwhile if the number of events you have to process takes longer
than additional database query. If your events are pure functions, it will take a fairly high number to be worse than the db query.
Event Schema
Anyway, let’s discuss what an event looks like on disc.

And by that, I mean… physically, what does our database schema look like? 

With event sourcing, there’s no real ‘correct’ way of doing things. You can use any type of data storage to store your events, although we at third
thirdchannel prefer postgres and have had some experience with cassandra

Anyway. I think there’s generally two approaches to what an event looks like on disc.

It’s either:
@svpember
Table Per Event:

specific schema for the properties of an event
list fields

An event, at minimum must have some identifier (we use uuids), a link to some entity id that this event belongs to, the revision number of the event - this
helps keep events ordered and helps guards against things like optimistic locking -, the user id of the user 

old price, new price, currency are all specific to this event
@svpember
Or, One event Table. In this scenario, as you may imagine, ALL your events are in one table. It’s easy to say, shard this table by date or something, but
basically, one event table.

Here, our events are essentially Schema-less. Or rather, there is an implicit schema. As your events are parsed or deserialized from disc, the application
example db schema

does anyone here not know about the jsonb datatype? 

Seriously, write this down. go look at this thing. switch to using Postgres entirely for it.

It’s a better document store than mongoDB. I don’t like MongoDB very much, although admittedly I haven’t given it a fair shake these past few years.

True story: Monday a friend of mine
@svpember
Recommend: One Event Table
• No Migrations, past the first (+)
• Trivial to Add New Events (+)
• Selecting Multiple Event Types for a single Entity in one Query, no joins (++)
• ProTip: Use Postgres and the jsonb data type (+)
• Querying across multiple event types with no joins (+)
• Zero to Minimal Database Constraints (-)
My recommendation is One event table…

+ and -
Data Locality & Service
Lifecycle
This title may be strange but bear with me.

So, now that we’ve chosen the schema for events… the question still stands: how does this change in light of a distributed environment? If I’ve been
arguing that we can have multiple data stores within our system, and each of services are generating events… well, where should these events physically
live?
@svpember
It’s a bit of a spectrum

given the assumption that each service has a data store, I think there’s two basic ‘pure’ strategies for event storage.
Service - Local Storage
On one end, each service is responsible for storing a certain set of domain events. Basically, anything that a service emits it should also store. This of
course requires that each service have its own datastore and generally will operate in a way you may be accustomed to. In addition to the events, you’ll
likely have models or materialized views representing the current state that are updated by the events
Central Store
	 •	 On the other end is the central event store. It has some mechanism to listen to all events that are emitted within your system and then save
them to one general data storage. Conceptually it’s one service that writes to that store (single writer), but can handle read requests from other services
(single writer, multiple reader). 

	 ◦	 What’s interesting architectural approach with this pattern is that it opens up your services to not need a local data store… they would be
entirely event driven and hold their entire state in local cache. It may sounds crazy, but is entirely feasible in this structure

Anyway. which approach to pick?… I don’t think there’s a right answer, it’s really what’s comfortable for you and your team. What can help, though, is
looking at different lifecycle moments your services can go through, to better illustrate each approach
@svpember
Event Storage Workflow Scenarios
• How Does a Service access events?
First, the most basic. What happens when a service wants to access events?
@svpember
Distributed (Local)
21
How do we query? the answer should be fairly obvious:
@svpember
Distributed (Remote)
1 2
3 4
5
6
@svpember
Central Store
<walk through>

In this scenario, I would advocate skipping routing the requests through the message queue. The central should be fairly prominent. Direct tcp/http queries
should be just fine.

With the distributed scenario, services may come and go and you typically will broadcast the query to receive events, without knowing exactly who/what
contains those events. With the central store, you do know, so you can typically skip around the Message Queue if you’d like
@svpember
Event Storage Workflow Scenarios
• How Does a Service access events?
• What happens when we bring up a new service?
What happens when we bring up a new service?

In this scenario:

- new service

- empty

- needs various events from different domains in order to bring itself into alignment with the current state of the other services
@svpember
Distributed
@svpember
Distributed
service appears. I need a,b,c!. responses
@svpember
Central Store
service appears. queries central store
@svpember
Event Storage Workflow Scenarios
• How Does a Service access events?
• What happens when we bring up a new service?
• What happens when a service misses or fails to process an event?
what happens when a service misses -> 

I’d argue that it’s difficult to ‘miss’ an event

if you’re using a message broker it should hold on to messages until services can read them

But, perhaps there’s some catastrophic event and you lose messages on the queue. 

What’s more likely is that your service won’t know how to handle

what we’re really talking about here is the ability to reprocess events
@svpember
Distributed
It’s basically the same as when a service comes online, although you’ll need a smaller set of data
@svpember
central
@svpember
Event Storage Workflow Scenarios
• How Does a Service access events?
• What happens when we bring up a new service?
• What happens when a service misses or fails to process an event?
• What about out-of-order events?
@svpember
Event Storage Workflow Scenarios
• How Does a Service access events?
• What happens when we bring up a new service?
• What happens when a service misses or fails to process an event?
• What about out-of-order events?
• What is the process for decommissioning a service?
@svpember
Decomission - Distributed
• Are we bringing up a new service?
• Are we simply killing functionality?
• Don’t get rid of the events!
With Distributed, you need a plan for what to do with the events you have in the system you’re shutting down. 

- If you’re replacing an old service with new, refactored version of it, you should be fine, so long as the new version knows how to respond to the same
requests for data and to handle the same commands and events

- If you’re killing the service entirely, that likely means the functionality is also going away.

- Still, something needs to be responsible for the events to support old requests.

- Consider offloading the events and any query mechanisms to the most closely related service
@svpember
Decommission - Centralized
• … just delete the service
With Centralized, the process is much easier. If you know the service isn’t need anymore… well, it’s gone. The events it was responsible for creating are still
in the central store should you need them.
Recap
@svpember
Per-Service Storage
• Less infrastructure
• Proper containment of events
• Requires multiple event consumers and ‘rebroadcast’ mechanisms
@svpember
Decentralized Storage
• Much larger footprint
• Convenient
• Violates ‘self-containedness’ and distribution of events
• Easier for mining purposes
• One rebroadcast mechanism
Managing Events
Event Versioning
Event Storming
@svpember
Event Storming Building Blocks
• Events
• Reactions - “Whenever an account is created, I need to send an email”
• System Commands
• User-Initiated Commands
• External Systems
• Policy - Flow of Events and Reactions
Event Storming- Policy
Conclusion: Yay, Events.
I realized that this talk ends rather abruptly. So here’s a slide to pace things out.

Events are great.
Thank You!
@svpember
spember@gmail.com
@svpember
@svpember
Images
• Ant with stick: https://www.reddit.com/r/photoshopbattles/comments/1uh80p/
perfectly_timed_photo_of_an_ant_lifting_a_stick/
• Why - man with bowtie: https://silktide.com/dear-ico-this-is-why-web-developers-hate-
you/
• Why - Ryan Reynolds: http://www.reactiongifs.com/r/but-why.gif
• Why - Jon Stewart: https://www.huffingtonpost.com/2013/10/25/jon-stewart-
apologizes-for-us_n_4162980.html
• EventStorming: https://blog.redelastic.com/corporate-arts-crafts-modelling-reactive-
systems-with-event-storming-73c6236f5dd7
• Fireworks: https://commons.wikimedia.org/wiki/
File:Canada%27s_fireworks_at_the_2013_Celebration_of_Light_in_Vancouver,_BC.jpg
• Mad Developers: https://www.hbo.com/last-week-tonight-with-john-oliver
• Workers removing delete key: https://gcn.com/articles/2015/03/31/deleted-emails.aspx
@svpember
Further Reading
• Event Versioning: https://leanpub.com/esversioning/read
• Event Storming: https://blog.redelastic.com/corporate-arts-crafts-
modelling-reactive-systems-with-event-storming-73c6236f5dd7

Más contenido relacionado

Similar a Event storage in a distributed system

Synchronous Reads Asynchronous Writes RubyConf 2009
Synchronous Reads Asynchronous Writes RubyConf 2009Synchronous Reads Asynchronous Writes RubyConf 2009
Synchronous Reads Asynchronous Writes RubyConf 2009
pauldix
 
Graphel: A Purely Functional Approach to Digital Interaction
Graphel: A Purely Functional Approach to Digital InteractionGraphel: A Purely Functional Approach to Digital Interaction
Graphel: A Purely Functional Approach to Digital Interaction
mtrimpe
 
Avram ODonovan Blogtalk2008
Avram ODonovan Blogtalk2008Avram ODonovan Blogtalk2008
Avram ODonovan Blogtalk2008
coniecto
 
Rails Operations - Lessons Learned
Rails Operations -  Lessons LearnedRails Operations -  Lessons Learned
Rails Operations - Lessons Learned
Josh Nichols
 
Software Development Life CyclesPresented byBrenda Reynold.docx
Software Development Life CyclesPresented byBrenda Reynold.docxSoftware Development Life CyclesPresented byBrenda Reynold.docx
Software Development Life CyclesPresented byBrenda Reynold.docx
rosemariebrayshaw
 

Similar a Event storage in a distributed system (20)

Synchronous Reads Asynchronous Writes RubyConf 2009
Synchronous Reads Asynchronous Writes RubyConf 2009Synchronous Reads Asynchronous Writes RubyConf 2009
Synchronous Reads Asynchronous Writes RubyConf 2009
 
From 🤦 to 🐿️
From 🤦 to 🐿️From 🤦 to 🐿️
From 🤦 to 🐿️
 
UserTesting 2016 webinar: Research to inform product design in Agile environm...
UserTesting 2016 webinar: Research to inform product design in Agile environm...UserTesting 2016 webinar: Research to inform product design in Agile environm...
UserTesting 2016 webinar: Research to inform product design in Agile environm...
 
Developing a Globally Distributed Purging System
Developing a Globally Distributed Purging SystemDeveloping a Globally Distributed Purging System
Developing a Globally Distributed Purging System
 
Datadog + VictorOps Webinar
Datadog + VictorOps WebinarDatadog + VictorOps Webinar
Datadog + VictorOps Webinar
 
Micro services
Micro servicesMicro services
Micro services
 
Graphel: A Purely Functional Approach to Digital Interaction
Graphel: A Purely Functional Approach to Digital InteractionGraphel: A Purely Functional Approach to Digital Interaction
Graphel: A Purely Functional Approach to Digital Interaction
 
Visual rhetoric, April 22, 2013
Visual rhetoric, April 22, 2013Visual rhetoric, April 22, 2013
Visual rhetoric, April 22, 2013
 
Avram ODonovan Blogtalk2008
Avram ODonovan Blogtalk2008Avram ODonovan Blogtalk2008
Avram ODonovan Blogtalk2008
 
Comment choisir entre Parse, Heroku et AWS ?
Comment choisir entre Parse, Heroku et AWS ?Comment choisir entre Parse, Heroku et AWS ?
Comment choisir entre Parse, Heroku et AWS ?
 
Server’s variations bsw2015
Server’s variations bsw2015Server’s variations bsw2015
Server’s variations bsw2015
 
Cross-platform logging and analytics
Cross-platform logging and analyticsCross-platform logging and analytics
Cross-platform logging and analytics
 
Rails Operations - Lessons Learned
Rails Operations -  Lessons LearnedRails Operations -  Lessons Learned
Rails Operations - Lessons Learned
 
50.000 orange stickies later
50.000 orange stickies later50.000 orange stickies later
50.000 orange stickies later
 
Amazon Kindle Paperwhite 4G Vs 2017 3G Model A Look At WhatS New
Amazon Kindle Paperwhite 4G Vs 2017 3G Model A Look At WhatS NewAmazon Kindle Paperwhite 4G Vs 2017 3G Model A Look At WhatS New
Amazon Kindle Paperwhite 4G Vs 2017 3G Model A Look At WhatS New
 
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership GrantPOWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
 
Progress of JavaScript Architecture
Progress of JavaScript ArchitectureProgress of JavaScript Architecture
Progress of JavaScript Architecture
 
Rhok 101 for change makers - with an agile flavour
Rhok 101 for change makers - with an agile flavourRhok 101 for change makers - with an agile flavour
Rhok 101 for change makers - with an agile flavour
 
Feedback loops between tooling and culture
Feedback loops between tooling and cultureFeedback loops between tooling and culture
Feedback loops between tooling and culture
 
Software Development Life CyclesPresented byBrenda Reynold.docx
Software Development Life CyclesPresented byBrenda Reynold.docxSoftware Development Life CyclesPresented byBrenda Reynold.docx
Software Development Life CyclesPresented byBrenda Reynold.docx
 

Más de Steve Pember

Surviving in a Microservices Environment
Surviving in a Microservices EnvironmentSurviving in a Microservices Environment
Surviving in a Microservices Environment
Steve Pember
 
An introduction to Reactive applications, Reactive Streams, and options for t...
An introduction to Reactive applications, Reactive Streams, and options for t...An introduction to Reactive applications, Reactive Streams, and options for t...
An introduction to Reactive applications, Reactive Streams, and options for t...
Steve Pember
 
A year with event sourcing and CQRS
A year with event sourcing and CQRSA year with event sourcing and CQRS
A year with event sourcing and CQRS
Steve Pember
 
An Introduction to Reactive Application, Reactive Streams, and options for JVM
An Introduction to Reactive Application, Reactive Streams, and options for JVMAn Introduction to Reactive Application, Reactive Streams, and options for JVM
An Introduction to Reactive Application, Reactive Streams, and options for JVM
Steve Pember
 

Más de Steve Pember (20)

Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023
Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023
Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023
 
SACon 2019 - Surviving in a Microservices Environment
SACon 2019 - Surviving in a Microservices EnvironmentSACon 2019 - Surviving in a Microservices Environment
SACon 2019 - Surviving in a Microservices Environment
 
Surviving in a Microservices environment -abridged
Surviving in a Microservices environment -abridgedSurviving in a Microservices environment -abridged
Surviving in a Microservices environment -abridged
 
Gradle Show and Tell
Gradle Show and TellGradle Show and Tell
Gradle Show and Tell
 
Greach 2018: Surviving Microservices
Greach 2018: Surviving MicroservicesGreach 2018: Surviving Microservices
Greach 2018: Surviving Microservices
 
Reactive All the Way Down the Stack
Reactive All the Way Down the StackReactive All the Way Down the Stack
Reactive All the Way Down the Stack
 
Harnessing Spark and Cassandra with Groovy
Harnessing Spark and Cassandra with GroovyHarnessing Spark and Cassandra with Groovy
Harnessing Spark and Cassandra with Groovy
 
Surviving in a microservices environment
Surviving in a microservices environmentSurviving in a microservices environment
Surviving in a microservices environment
 
Surviving in a Microservices Environment
Surviving in a Microservices EnvironmentSurviving in a Microservices Environment
Surviving in a Microservices Environment
 
An introduction to Reactive applications, Reactive Streams, and options for t...
An introduction to Reactive applications, Reactive Streams, and options for t...An introduction to Reactive applications, Reactive Streams, and options for t...
An introduction to Reactive applications, Reactive Streams, and options for t...
 
An Introduction to jOOQ
An Introduction to jOOQAn Introduction to jOOQ
An Introduction to jOOQ
 
Reactive Streams and the Wide World of Groovy
Reactive Streams and the Wide World of GroovyReactive Streams and the Wide World of Groovy
Reactive Streams and the Wide World of Groovy
 
A year with event sourcing and CQRS
A year with event sourcing and CQRSA year with event sourcing and CQRS
A year with event sourcing and CQRS
 
An Introduction to Reactive Application, Reactive Streams, and options for JVM
An Introduction to Reactive Application, Reactive Streams, and options for JVMAn Introduction to Reactive Application, Reactive Streams, and options for JVM
An Introduction to Reactive Application, Reactive Streams, and options for JVM
 
Reactive Streams and the Wide World of Groovy
Reactive Streams and the Wide World of GroovyReactive Streams and the Wide World of Groovy
Reactive Streams and the Wide World of Groovy
 
Richer Data History with Event Sourcing (SpringOne 2GX 2015
Richer Data History with Event Sourcing (SpringOne 2GX 2015Richer Data History with Event Sourcing (SpringOne 2GX 2015
Richer Data History with Event Sourcing (SpringOne 2GX 2015
 
Springone2gx 2015 Reactive Options for Groovy
Springone2gx 2015  Reactive Options for GroovySpringone2gx 2015  Reactive Options for Groovy
Springone2gx 2015 Reactive Options for Groovy
 
Gr8conf US 2015 - Intro to Event Sourcing with Groovy
Gr8conf US 2015 - Intro to Event Sourcing with GroovyGr8conf US 2015 - Intro to Event Sourcing with Groovy
Gr8conf US 2015 - Intro to Event Sourcing with Groovy
 
Gr8conf US 2015: Reactive Options for Groovy
Gr8conf US 2015: Reactive Options for GroovyGr8conf US 2015: Reactive Options for Groovy
Gr8conf US 2015: Reactive Options for Groovy
 
Groovy Options for Reactive Applications - Greach 2015
Groovy Options for Reactive Applications - Greach 2015Groovy Options for Reactive Applications - Greach 2015
Groovy Options for Reactive Applications - Greach 2015
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Event storage in a distributed system

  • 1. I’d like to start with a story A few months back I was at work, and received a twitter notification
  • 2. @svpember A guy I know tweeted this at me:
  • 3. @svpember Oliver Gierke, who’s the Spring Data lead at Pivotal, sent out this tweet. <read> So I, uh, rather enthusiastically responded…
  • 4. @svpember by deluging the poor guy with tweets until he responded with ‘heyyy this has been helpful, thanks! *Upside-down smiley face*” Other people jumped on the conversation too, of course, but I think I was the most. ah. enthusiastic.
  • 5. @svpember One of the other participants tweeted this: “maybe you should make a blog post?” Which is wonderful, right. Validation from strangers on the internet! I thought better yet, I’ll make a talk.
  • 6. Event Storage in a Distributed System Steve Pember CTO, ThirdChannel steve@thirdchannel.com Software Architecture Conf 2018: NYC @svpember So I wrote it up, submitted it, and here we are. My name is Steve. I work for a company called ThirdChannel, out of Boston. Now, I Realized that I accidentally hit the 90 minute checkbox when submitting this talk, so, ah, the scope of this a little bit more involved than the title initially suggests. Let’s talk about Events
  • 7. It’s all about Events This talk is all about events. You’re going to be sick of hearing the word by the end of this presentation. However, bear with me. it is important. I feel that systems that are not representing transitions within themselves as events, and are not actively listening to or taking advantage of these internal events… not being quote / unquote ‘reactive’… are missing out on huge advantages in flexibility, reporting, and scalability both in terms of deployments and operational / developmental scalability. Now, after that bold statement… well, we’ll get into all of that, but first we should discuss the most fundamental question…
  • 8. What is an Event? What is an event? Does anyone want to take a stab at defining what we should consider an event?
  • 9. @svpember Event • Something that has occurred in the past within some software • Intended for consumption by other software • Distribution is Often asynchronous • Often contains data detailing the event • Immutable a Well, I would classify it as… a piece of data that that signifies some action has been performed in the past within some software two most important bits are in the past and immutable
  • 10. @svpember So, events like “Order placed”… Are all great. They denote that something has happened. Immutability is another important part. Let’s say ‘ItemsShippedEvent’ was emitted with a value of 5. It would potentially be disastrous for a something to later change that value to say, 1000, right? Would disrupt all meaning
  • 11. –Martin Fowler “Domain Event” https://martinfowler.com/eaaDev/DomainEvent.html Things happen. Not all of them are interesting, some may be worth recording but don’t provoke a reaction. The most interesting ones cause a reaction. Many systems need to react to interesting events. Often you need to know why a system reacts in the way it did. Because it wouldn’t be an Architecture talk without a Fowler quote… Another way to think of events.. which frames most of this discussion, is a quote from Martin Fowler: don’t read! Basically: “Important events cause reactions elsewhere in the system, and it’s often important to understand why those reactions occurred”.
  • 12. As an aside… Reacting to events may be nothing new to Javascript or frontend developers • Your browser’s DOM, Javascript, and I suppose UIs in general are full of events. Literally anything you do on the browser generates an event. Move the mouse, click a box, type a letter, let go of a letter, etc. • While the knowledge of this talk is transferable to the frontend to some extent, the majority of this talk is focused on the server side. Server side doesn’t traditionally deal with lot of events, I’d say. particularly if you started your dev life and career with big frameworks like rails, grails, django, etc
  • 13. It’s not hard to do, to program in terms of events Generally, one or more event are created when a user successfully performs an action or Command. (slide of various event names, will reuse this aesthetic later). They represent successful deltas or actions that have occurred in the past Now, here, we have code which accepts some incoming Command object to create a new TodoList, validates it, and generates two events based on that command, saves the events, saves a projection of the current state of the TodoList entity (though this is optional, as I could recreate the state of the todo list entirely from the events), and transmits them.
  • 14. My domain objects or ‘Entities’, start becoming highly functional as they acquire methods to manipulate them by applying events. This is certainly not production code, but you can see how my entities start acquiring handlers on themselves that when provided various events know how to use that event to update their internal state. In production, I’d probably have the entity use an internal mutable builder, and the builder receive the events, which would then spit out a validated, immutable Entity. but Alas, this is an example.
  • 15.
  • 16. @svpember A slide back I mentioned that Events need to be transmitted. Well, these events need to be seen by others to be useful. It’s one thing to have my entity only see its events, but it’s entirely another thing to share.. and mix events from across the system. And so, Need to have some method of transmitting these events to other interested parties Both Internal and External For internal, this typically involves some asynchronous publish/subscribe mechanism. Tools that I’ve used successfully for this purpose have included a library called Project Reactor, and using Reactive Streams via the rxJava library
  • 17. @svpember Externally, these events can be transmitted either via point to point http or via some asynchronous message queue… which as we’ll see later is my preferred method.
  • 18. At this point you, might be saying.. ok, cool but why… That’s one point of criticism I often get for my talks. I mention all these great things I’m working with but neglect to really hammer home the ‘WHY’ of it all. So.. why? why should you care about any of this? so far it just looks like I’m adding a bunch of extra hassle for you.
  • 19. The reason is that events, these smaaaaallll bits of information are collectively extremely powerful. The point of this presentation, the synopsis, the end goal, is to try and show you that tracking events, persisting them, and treating them as first-class citizens within your system is a wise idea with loads of potential benefit. AND that there are some caveats to be aware of when we talk about how to store these events and work with them within a distributed environment. However, there are some steps in the way to get there.
  • 20. @svpember Overview • Event-Oriented Distributed System Architecture Today, we’re going to discuss some information that I’ve broken down into the following topics: <read> The architectural background of this talk. This is an architectural conference, and it’s important. This will cover some concepts and architectural designs to help prepare your systems to think in terms of events. And because we’re architects, we’ll probably have some boxes and lines drawn up on the screen, because it wouldn’t be an architectural presentation without good ol boxes and lines
  • 21. @svpember Overview • Event-Oriented Distributed System Architecture • The Power of Events after we get in the mindset of working with events and architecting our systems to operate in an event-first fashion, we’ll look into why you should be excited about having events laying around as first-class citizens within your app. I do this topic second, as I think the architecture portion is the harder pill to swallow… plus I think that getting your head around the existence of events makes it easier to start to see their usefulness. Although I could be wrong, let’s give it a go.
  • 22. @svpember Overview • Event-Oriented Distributed System Architecture • The Power of Events • Event Storage & Lifecycle After that, we’ll discuss the impact of storing events, different patterns for doing so, and some lifecycle concerns for services in an event based environment
  • 23. @svpember Overview • Event-Oriented Distributed System Architecture • The Power of Events • Event Storage & Lifecycle • Day to Day Concerns / Working with Events And then finally, some additional details of working with events that don’t necessarily have to do with storing them. Alright… let’s begin
  • 24. Let’s start with Microservices First, a few minutes on Microservices
  • 25. I’ve borrowed this slide before. Thanks Alvaro! I like it because it’s honest. You start with a mess and if you’re not careful you end up with a distributed mess Anyway… • How many of you have attended SACon before? This is, I think, my third or fourth time over 4 years. I’m pretty sure the first two years were almost exclusively talks about microservices. I know I contributed to it, eh? And there’s a good reason This notion of Microservices has been a great transformational thing in software development and architecture. Even if you think it’s a rehash of SOA, it still has been promoting the virtues and popularity of distributed systems with the larger community… which I think is a good thing. • Now… Just to get a general poll… who’s working with them? And how here likes working with them! Any hands go down.. aw, some jaded folks
  • 26. @svpember The Promise of Microservices • Reduced complexity per service • Easier for developers to understand a single service • Teams work with more autonomy • Independent scaling, deployments, builds • Fault isolation • “Right tool for the job” • Isolation and decoupling • Continuous Delivery and Deployment The promise of Microservices is vey alluring, yeah? Right out of the gate, we immediately reduce the complexity of our codebase by making it several smaller codebases My favorite: It allows teams to work in great autonomy, with improved isolation and decoupling. It allows for independent scaling of services. The most powerful is that micro services Allows for continuous Delivery and Continuous Deployment. Which honestly is… I think the pinnacle of efficiency a software dev team should be striving for. Now I’m being a bit flippant about that because we should be concerned about testing and regressions of our releases, of course… but all of this is another topic entirely.
  • 27. @svpember … Some Caveats • Vastly increased infrastructure complexity • So much Ops • Teams need to handle all lifecycle steps of service deployment • Conceptual difficulty with multiple service deployments • Potential performance hits for intra-service comms As useful and as powerful as all of that is… there are absolutely some tradeoffs when using microservices. You immediately… IMMEDIATELY have increased complexity in your infrastructure. And going back to my point that micro services have been good for growing awareness within the community… the rise of tooling in this space has just been insane. Kubernetes, Hashicorp’s entire business model… it’s great stuff The point here is that if your team isn’t ready to shift the complexity of the codebase into infrastructural and ops complexity, you should probably hold off.
  • 28. A few things bothered me… That being said, I still think that the Microservice approach is very useful. However, as it’s been growing, three points have always bothered me that I never felt were fully discussed or I discussed agreed in the presentations I’ve seen and material I’ve read. Just as a preview… did everyone see the keynotes yesterday morning? Cornelia Davis started out her presentation listing issues with distributed systems and it spoke to my soul man. She basically gave this talk already.
  • 29. @svpember Questions about Microservices • How should they communicate? First: how should these services communicate?
  • 30. @svpember You see, when the term ‘microservice’ became the rage, it was my observation that people were building services which utilized point to point synchronous http comms to query, post, etc data between services. There’d be service discovery systems in play necessary to make other services aware of each others existence. These synchronous calls utilize resources (e.g. threads), block, take time… and if a service goes down, what happens if another service is reliant upon it?
  • 31. @svpember slide of multiple services being needed to support a single hop And I’m aware of netflix’s hystrix and other circuit breaker technologies to help with all of this, but still seems a lot could go wrong in that chain.
  • 32. Time to Go Reactive To address point 1, I suggest embracing a design pattern known as ‘Reactive’
  • 33. @svpember Reactive Systems • Communication between services driven by asynchronous events Has anyone here heard of the ‘Reactive Manifesto’? I’m big fan of it, but going to bit of a spin on its tenants to fit my narrative here. Anyway, when say ‘Reactive’, we don’t mean ‘React.js’ or Reactive Streams (Though I love both of those things) It’s a design philosophy to apply to systems to help them achieve high scalability. The first rule for Reactive Systems is that… On a positive note, Based on what I’ve been seeing over the past year or so, the collective opinion is moving away from entirely http to be more event driven, which is great. There are, for example, several talks at this conference on this very subject.
  • 34. @svpember Reactive Systems • Communication between services done by asynchronous events • Services ‘React’ to observed events Anyway, point two!
  • 35. @svpember Reactive Systems • Communication between services done by asynchronous events • Services ‘React’ to observed events • Use some Message Broker technology to promote Async and reduce Data loss
  • 36. @svpember Reactive Systems • Communication between services done by asynchronous events • Services ‘React’ to observed events • Use some Message Broker technology to promote Async and reduce Data loss • Synchronous HTTP calls between services kept to a minimum by reducing the # of synchronous calls, we gain two main benefits: - less resource contention on the thread pools of each of our services - Firewall like effect: if services die, the they don’t cause other systems to fail or have to rely on fallback circuit breaker code - can reduce the number of calls, by collecting data from other services One side effect of this and the previous points is that your platform should become quite fast, as well as have… <next>
  • 37. @svpember Reactive Systems • Communication between services done by asynchronous events • Services ‘React’ to observed events • Use some Message Broker technology to promote Async and reduce Data loss • Synchronous HTTP calls between services kept to a minimum • Resiliency against failing services - By reducing and eliminating the need to runtime dependency on other services, each service can function in isolation and will not be brought down by failing services. And that’s my basic overview of Reactive systems. Everyone with me so far? No. too bad, let’s keep going.
  • 38. @svpember Questions about Microservices • How should they communicate? • How “large” should a micro service be? • How much responsibility should a single service have? Now, the next two… <read> I feel can be solved or addressed by an architectural design pattern known as…<next>
  • 39. Domain Driven Design Command Query Responsibility Segregation Domain Driven Design (or DDD) and a related variant called Command Query-Responsibility Segregation (aka CQRS) Are two architectural patterns intended for very complex systems. These are not trivial things to understand and you should take care if you plan to adopt them. They have great power, though
  • 40.
  • 41. - Eric Evans “Some objects are not defined primarily by their attributes. They represent a thread of identity that runs through time and often across distinct representations. Sometimes such an object must be matched with another object even though attributes differ. An object must be distinguished from other objects even though they might have the same attributes.” One of the most interesting parts of DDD, one that really stuck with me, is this quote: <read quote> - that’s interesting, yeah?                If I change my name, am I no longer me? Of course not, I’m defined by more than my name. If I change my email address, or my address, or my social security number for some reason, am I no longer me? Obviously not… Can your database understand identity changes like this and still be able to find the original object?
  • 42. @svpember Domain Driven Design • Ubiquitous Language DDD has several interesting ideas besides that quote of course. When building a system adhering to DDD, it offers several guidelines. The first: Ubiquitous Language. Everyone in the company should be speaking in the same terms. The same concepts. Your classes and objects should reflect the Language. Marketing should be using the same terms as Sales and as Engineering. When Engineering builds a new Feature or new Service the entire company should be using it. If you’re an e-commerce app and product management decides to name the Product Catalog… ah… ‘Zephyr’ for some reason, Engineering is also calling it Zephyr, and there better be a Zephyr.java file somewhere in the repo.
  • 43. @svpember Domain Driven Design • Ubiquitous Language • Entities / Value Objects
  • 44. Entity Objects are those that you truly care about tracking. These objects will have specific identifiers and you will take great care in maintaining their individuality and their relationship to other objects.
  • 45. Value Objects: when you know you have a lot of something, but you don’t care about the identity of each individual item. The example that I believe is given in the DDD book is that a automobile Engine might be an Entity (it has a unique serial number that mechanics might car to track), while the wheels… well, the car has 4, and I don’t necessarily assign any uniqueness to them
  • 46. @svpember Domain Driven Design • Ubiquitous Language • Entities / Value Objects • Aggregates Third term: Aggregates
  • 47. Group of Entities With One Root … the aggregate Root. the root acts the parent or entry point when referencing the aggregate, which leads to the next point:
  • 48. @svpember Domain Driven Design • Ubiquitous Language • Entities / Value Objects • Aggregates • Bounded Contexts powerful concept logical grouping of related functionality
  • 49. @svpember The key point here is that Objects inside an Aggregate may hold references to other Roots. But only the root id. They may not hold any information about entities below the root within that context.
  • 50. Combine and Isolate related objects into Modules It’s a natural step then in your code to ensure that you combine and isolate all related objects within a context into one module.
  • 51. No Direct comms across boundaries And, no direct communications are allowed across context or module boundaries. Well if that’s true, what do I do if I need information from across boundaries?
  • 52. @svpember Domain Driven Design • Ubiquitous Language • Entities / Value Objects • Aggregates • Bounded Contexts • Domain Events
  • 53. @svpember Here we have several of our Modules - whose names mirror our Ubiquitous Language, btw. They cannot talk to each other directly, but rather through some intermediary mechanism. Now, it could be direct message passing, but this differs from importing and calling methods directly in the module. You could use a Pub Sub Mechanism, or some sort of message broker… etc, etc. The important bit is that the modules are bounded away from each other.
  • 54. Events Are Transactionally Safe That is, no Events are emitted until an item is successfully saved to disk. The event is part of the transaction.
  • 55. @svpember Domain Driven Design • Ubiquitous Language • Entities / Value Objects • Aggregates • Bounded Contexts • Domain Events • One Last Takeaway…
  • 56. Wait… what if I need to share across contexts? This may not be phrased correctly , but the answer to this is one of the most powerful aspects of DDD and one of the hardest to get used to. Bounded contexts are isolated, autonomous components with their own entities, classes, service objects, etc…. However, all the bounded contexts exist within the same system, and certain concepts or Entities will likely exist throughout the entirety of a system… although each context may only care about a subset of the info about that entity. Or, another way to phrase it: each bounded context is only concerned with some subset of an Entity within a system, and no context will know the entire set of information about an entity. This separation is the concern of the context’s boundary
  • 57. @svpember - For example! The catalog context knows how much inventory is left for this particular SKU, but the shopping cart and the admin context don’t necessarily need the information. The inventory count may be a function of a ‘Warehousing context’ that the catalog receives events for. Similarly, the Shopping Cart context contains the quantity, the # of this SKU that the user wants to purchase. That information has no bearing on the catalog context. This concept - that an entity can exist in multiple contexts, though each context is only concerned with a subset of that information - is very powerful, and very useful. Understanding which information belongs in which context… and maintaining that decoupling, is however, one of the toughest aspects of DDD, and can be a challenge for younger developers or those newer to ddd to grasp. For example, we recently had a situation…
  • 58. And now, CQRS Command Query Responsibility Segregation is an evolution off of DDD, that calls for changes in how one accepts and sends data
  • 59. MVC With your standard MVC/ CRUD style approach that you get out of the box with many big frameworks, the pattern generally is something like the following: - user makes a change via the ui, let’s say to a Product object - There’ll likely be a Product Controller which takes that data - passes it to a Product Model - which in turn gets validated and saved to the database, likely in a table named ‘product’ When the user wants to retrieve information about the product, the same objects are used. Query for product with id X, product controller uses a product model to retrieve data from the product row, then passes the retrieved model up to the ui
  • 60. CQRS CQRS says: why use the same objects for every task? It makes a hard distinction between modifying actions the user is attempting to do - which it calls ‘Commands’ - and data retrieval tasks - aka Queries - , and thus breaks up the underlying code to enforce that distinction. So, If a user wants to make a change to some data, say again, a Product, he manipulates the change in the ui, maybe clicks a button, and a relevant controller converts that request into a ProductChangeCommand object or model, which contains details on what the user is trying to change. That command is then validated, then the changes are persisted AND, while not shown here, domain events are emitted. As for Querying, a User dictates the query, it’s packaged by the query controller into a query object, results are pulled from the database, and the response is returned to the user. While it reads similar it’s important to understand that the models are different, and often advantageously so. The Product information I display to the end user may be only a subset of the Product model / entity object, so I only pull a few fields from the db. Or, my query object represents a composite of several models in one multi faceted report that is pulled in one query.
  • 61. CQRS Following that line of thinking, we can extends this a bit further. We can isolate our writes and our reads into separate contexts or services. Besides the nice decoupling, it allows us to get creative in other areas: - want to scale out our write capability vs our reads? no problem. just scale up one of those services - if our write service emits domain events when it saves, could our query service listen for other domains’ events’? sure! - want to have custom query reports that pull from multiple domains? no problem! You can build multiple query models that are highly targeted towards what ever end user report or experience you’re trying to deliver.
  • 62. Allows for interesting Architectures Continuing that line of thinking, it allows for some very interesting Architectures
  • 63. This a graphic taken from Udi Dahan’s website, he’s another pioneer in the CQRS space. What this diagram is trying to depict similar to what I’ve been describing. The blocks labelled ‘AC’ stand for autonomous component, I think. Think of them like a service. So, in the bottom left, the user enters a command to the first service. it succeeds and the changes are written to local storage. Events are published and retrieved by one or many other services, who update their local query caches based on those events. Then, when the user performs a query, it hits that highly targeted query cache, giving the user the intended results with a minimum of sql queries or joins.
  • 64. Ok then, still seems like a lot of trouble… so why.. why go through all this trouble? 
  • 65. @svpember Reactive + CQRS • System is ideal for both write- and read-heavy applications • writes are contained for a particular service / bounded context • scale up services receiving writes • create query caches specifically designed to handle queries • can scale those up too • Efficient Querying -> just going to highlight this again • One note: due to the distributed - and as we’ll see soon event based storage, if your company has analysts they will likely hate you. You’ll likely need to build a service or query store solely for them to run sql queries against.
  • 66. @svpember Reactive + CQRS • System is ideal for both write- and read-heavy applications • Service Design via Bounded Contexts
  • 67. Bounded Contexts are an excellent tool to determine Microservice responsibility and potentially sizing of a service. i.e. how big should it be? when do we create a new one? We went through a team exercise where we tried to figure this out… - Each circle represents a context boundary - big outer circle is 3c itself - four big inner circles are different functional areas of our company - small inner inner circles represent contexts further still What we found was that our services for the most part mapped to the smaller circles, which was great. although there was much duplication (e.g. services belonging to multiple contexts) and some we identified that should be combined. those are both bad.
  • 68. @svpember Reactive + CQRS • System is ideal for both write- and read-heavy applications • Service Design via Bounded Contexts • Failure Protection
  • 69. @svpember • For a sufficiently large system, something is always going to be in a failure state • reducing or eliminating calls between services when handling a Command or a Query eliminates the dependency on that or those additional services • Service being queried will still function • One of the tenants of the ‘Reactive Manifesto’, provides a stopgap for failures impacting the user. • For example, if the Shopping Cart Management service is down, my product catalog service should not be affected and the user should still be able to browse the catalog • Additionally, using a durable Message Broker for communication grants additional layers of protection. We use RabbitMq, but loads of folks have great success with tools like Kafka. These tools will hold on to messages, allowing consuming services to consume them at their leisure. This has advantages in situations were a service is down for a period of time. Or, imagine if a service cannot handle a message the devs need to fix it, the message waits on the queue until the service is back online resulting in no data loss. image - broken down service, happy product catalog
  • 70. @svpember Reactive + CQRS • System is ideal for both write- and read-heavy applications • Service Design via Bounded Contexts • Failure Protection • Promotes Async comms • eliminating or reducing calls between services when handling Commands and Queries also eliminates blocking, synchronous calls to these services • reduces resource contention on thread pools • Also eliminates a potential failure vector: services ‘backing up’ with chains of communications
  • 71. @svpember Reactive + CQRS • System is ideal for both write- and read-heavy applications • Service Design via Bounded Contexts • Failure Protection • Promotes Async comms • “Simple” Testing • I put ‘simple’ in quotes there because no matter what I say next… this is still a distributed system we’re talking about - Each service can be heavily unit and integration tested in isolation of each other • Testing of the platform as a whole is important and useful. At least for us, a good chunk if not most of our bugs is contract violations in the incoming commands and events (e.g. I thought event A was emitting two fields but it’s actually three and I didn’t have JSON ignore properties set in Jackson, or a service misspelled a variable name, or someone changed event A’s fields.) • To me, this means that our organization is lacking communication and is violating the CAP theorem, or it’s being proven true, or however you want to say it. The point here is to make sure that other teams are aware of the shape of events emitted from your services.
  • 72. @svpember Reactive + CQRS • System is ideal for both write- and read-heavy applications • Service Design via Bounded Contexts • Failure Protection • Promotes Async comms • “Simple” Testing • Reduces cross-service Querying One of the most important reason you do something like this? To avoid having to write queries that would otherwise span multiple services. Has anyone had to write queries that span multiple services? It’s a nightmare. so inefficient. So, while this approach may feel like you’re duplicating data or whatnot, it’s quite efficient
  • 73. @svpember Reactive + CQRS • System is ideal for both write- and read-heavy applications • Service Design via Bounded Contexts • Failure Protection • Promotes Async comms • “Simple” Testing • Reduces cross-service Querying • Also… And it also leads to one of my absolute favorite concepts…
  • 75. @svpember Event Sourcing • Alternate Storage Pattern -> Prevents Data Loss
  • 76. @svpember More specifically, it’s an alternative to your standard ORM storage mapping, Where an object in memory maps directly to a row in a database, even if that row may be split via joins * update is made to a model, updates a column in your database * in this method, the only thing you know about your data is what it looks like right now.
  • 77. @svpember Event Sourcing • Alternate Storage Pattern -> Prevents Data Loss • Store Deltas
  • 78. Store Deltas, not Current State
  • 79. @svpember This stream of events are persisted in our database in the order they occurred, as a journal of what has transpired. These events can then be played back against our Domain Object, building it up to the state it would be at any given point in time, although this is most likely the Current State
  • 80. @svpember Event Sourcing • Alternate Storage Pattern -> Prevents Data Loss • Store Deltas • Additive Only • Time Series
  • 81. Never Delete, Never Update It means you only ever insert data into your database. No event rows are ever, ever, ever updated or deleted. In so doing you’ve now turned your events table into an append only journal, which is every efficient for most databases to do. In other words, events are immutable!
  • 82. @svpember Event Sourcing • Alternate Storage Pattern -> Prevents Data Loss • Store Deltas • Additive Only • Time Series • It’s Old Rather, this basic idea of storing deltas, rather than just current state has been around for a long time
  • 83. - Every transaction you make with your bank. Every Credit or Debit made is logged, along with an audit trail of who (e.g. which teller) made the change. - To get your balance, your bank simply adds up each of these transactions - May also periodically record what the balance was at specific points in time, to prevent having to recalculate everything from the beginning of time. - Can you imagine if you checked your bank statement and it could only show you your current balance… not how you reached that number?
  • 84. Lawyers! If a contract needs to be adjusted, is the contract thrown out and re-written? No. Rather, ‘addendums’ are placed on the contract. To figure out the contract actually involves, one has to read the initial contract, and then each successful addendum in order to figure out what the thing actually says.
  • 85. @svpember Event Sourcing • Alternate Storage Pattern -> Prevents Data Loss • Store Deltas • Additive Only • Time Series • It’s Old • Easy To Implement
  • 86. promotes a highly functional style, that’s very easy to unit test if I need to get the current state of an entity, it’s as simple as select * from event where id = x, then pass each event into an entity class, which will build it up to current state by using the functions outlined here
  • 87. @svpember Event Sourcing • Alternate Storage Pattern -> Prevents Data Loss • Store Deltas • Additive Only • Time Series • It’s Old • Easy To Implement • … Difficult to Grasp
  • 88. Now, this is where ES may start to hurt your brain
  • 89. All Entities are Transient Derivatives of the Event Stream
  • 90. Objects are backed - ‘sourced’ - by events Which is just a fancy way of saying: All Objects are ‘backed’ or ‘sourced’ by various events from the Journal or Event Stream
  • 91. @svpember Now this has lots of powerful uses which we’ll get into in a bit, but regardless…
  • 92. Can be difficult for Junior Engineers I’ve found that this entire concept can be a bit difficult for junior developers to grasp at first. So be aware of that. Internal Education can help dramatically.
  • 93. But, why? I’m sure that I’m making this all sound very attractive. You’re again probably asking yourself… ok, great… but why? Ok, so why?
  • 95. @svpember Event Sourcing: Why • Append-Only • Prevents Data Loss
  • 96. Never Delete! With Event Sourcing, no events are EVER deleted or updated, a nice side effect of the Append only nature
  • 97. @svpember Event Sourcing: Why • Append-Only • Prevents Data Loss • Time Travel
  • 98. @svpember Event Sourcing: Why • Append-Only • Prevents Data Loss • Time Travel • Perfect For Time Series
  • 99. @svpember Event Sourcing: Why • Append-Only • Prevents Data Loss • Time Travel • Perfect For Time Series • Automatic Audit Log ++ Built in, Automatic Audit Log for your Entities
  • 100. Audit Logs tell the History Events tell the Intent of History
  • 101. Furthermore, Having the Events as a first-order member of your platform can give you enhanced information around what your users or systems are doing beyond what might normally get written to the database. Can make events that don’t necessarily deal with the properties changed by a user, but additional actions that may have occurred And it’s easier to work with and analyze the data if the events are integrated within your platform already.
  • 102. @svpember One trivial example is this. One of first ES systems was an Internal User Management system, where our Program Managers (don’t worry about these terms) track prospective ThirdChannel employees, which we call agents. Our Managers wanted a way to get history for each potential agent, and because of Event Sourcing, it was about 5 minutes of work to display the history of each agent like that.
  • 103. @svpember Event Sourcing: Why • Append-Only • Prevents Data Loss • Time Travel • Perfect For Time Series • Automatic Audit Log ++ • Data Mining and Reporting
  • 104. @svpember And so you may be thinking, ok great: I can get all the relevant events for Bob’s shopping cart, but I’m just only ever going to play them all to get his current state! And for much of the time, yes, users definitely want to know what their current state is. The magic comes though with the business value. Business loves time series data
  • 105. @svpember Reporting in Event Temporality • Look at ALL users Product Removed events: which products are being discarded? • Find ALL ProductAdded + ProductRemoved event pairings (i.e. the same product, user) that occur with 5 minutes: perhaps a user is hesitant on purchasing… maybe offer them a discount? • Find avg time between ProductAdded and OrderPlaced: how long are users sitting with non-empty carts? • Find Trends in all of the above Because you have access to the all data changes that have happened along with the time relationship… you can get extremely creative.. Now, I’m very creative, but here’s a few ideas for reports on just users’ shopping cart events…
  • 106. Business Types love Reports I assure you, if you start showing off or even hinting to your product research teams, your product owner teams, etc that these capabilities could exist within your platform… they will get very excited
  • 107. Collecting and applying your events like this is known as a ‘projection’. You’re taking events from your various event streams and projecting them into or onto some data structure. You can also think of it as a Materialized View. maybe just an image and talk about updating projections
  • 108. Is there a performance hit? Save ALL the events? So, that all being said.. Two questions I generally receive when talking about this…<read> The answer to the first is yes, there is a little performance hit if you play events each time during a user request. However, we’ve found that it’s fairly minimal for entities with fewer than a hundred or so events on them. It’s one SQL query to get all events for an entity to see current state, just as it would be with an ORM. However, we typically tend to follow the CQRS methodology and build current state projections for run time querying that users see. Now, for the answer to the second, let’s jump to the next section….
  • 110. Event Storage Ok, With that all in mind, let’s proceed to the next section, Event Storage… let’s talk about how we store all these events. First, I want to address the two questions I left on.
  • 111. First, in an event oriented world, you’re going to acquire quite a few events. Going to seem pretty messy after a while. You’ll end up with perhaps larger database sizes than you originally thought, though, remember this is largely due to the fact that you’re now tracking a third dimension of time within your data store And really… data storage is CHEAP. most of us are not netflix or facebook and the scale of events we’ll be working with is very manageable.
  • 112. Now, of course, if this is at all bothersome, you can adopt a compaction strategy. The most well known is SnapShotting, where you compress related events older than say, two years into one object, then extract the raw events and put them into a cheaper long term storage like AWS glacier. Still never throw them away, though.
  • 113. @svpember You can also use Snapshotting more frequently as a mechanism to alleviate performance troubles, too. Make a snapshot on some interval, say every week.. or every 100 events. You load the snapshot first, then find all events since the snapshot was taken. The issue here is that this adds an additional query to fetch the snapshot, so it’s only worthwhile if the number of events you have to process takes longer than additional database query. If your events are pure functions, it will take a fairly high number to be worse than the db query.
  • 114. Event Schema Anyway, let’s discuss what an event looks like on disc. And by that, I mean… physically, what does our database schema look like? With event sourcing, there’s no real ‘correct’ way of doing things. You can use any type of data storage to store your events, although we at third thirdchannel prefer postgres and have had some experience with cassandra Anyway. I think there’s generally two approaches to what an event looks like on disc. It’s either:
  • 115. @svpember Table Per Event: specific schema for the properties of an event
  • 116. list fields An event, at minimum must have some identifier (we use uuids), a link to some entity id that this event belongs to, the revision number of the event - this helps keep events ordered and helps guards against things like optimistic locking -, the user id of the user old price, new price, currency are all specific to this event
  • 117. @svpember Or, One event Table. In this scenario, as you may imagine, ALL your events are in one table. It’s easy to say, shard this table by date or something, but basically, one event table. Here, our events are essentially Schema-less. Or rather, there is an implicit schema. As your events are parsed or deserialized from disc, the application
  • 118. example db schema does anyone here not know about the jsonb datatype? Seriously, write this down. go look at this thing. switch to using Postgres entirely for it. It’s a better document store than mongoDB. I don’t like MongoDB very much, although admittedly I haven’t given it a fair shake these past few years. True story: Monday a friend of mine
  • 119. @svpember Recommend: One Event Table • No Migrations, past the first (+) • Trivial to Add New Events (+) • Selecting Multiple Event Types for a single Entity in one Query, no joins (++) • ProTip: Use Postgres and the jsonb data type (+) • Querying across multiple event types with no joins (+) • Zero to Minimal Database Constraints (-) My recommendation is One event table… + and -
  • 120. Data Locality & Service Lifecycle This title may be strange but bear with me. So, now that we’ve chosen the schema for events… the question still stands: how does this change in light of a distributed environment? If I’ve been arguing that we can have multiple data stores within our system, and each of services are generating events… well, where should these events physically live?
  • 121. @svpember It’s a bit of a spectrum given the assumption that each service has a data store, I think there’s two basic ‘pure’ strategies for event storage.
  • 122. Service - Local Storage On one end, each service is responsible for storing a certain set of domain events. Basically, anything that a service emits it should also store. This of course requires that each service have its own datastore and generally will operate in a way you may be accustomed to. In addition to the events, you’ll likely have models or materialized views representing the current state that are updated by the events
  • 123. Central Store • On the other end is the central event store. It has some mechanism to listen to all events that are emitted within your system and then save them to one general data storage. Conceptually it’s one service that writes to that store (single writer), but can handle read requests from other services (single writer, multiple reader).  ◦ What’s interesting architectural approach with this pattern is that it opens up your services to not need a local data store… they would be entirely event driven and hold their entire state in local cache. It may sounds crazy, but is entirely feasible in this structure Anyway. which approach to pick?… I don’t think there’s a right answer, it’s really what’s comfortable for you and your team. What can help, though, is looking at different lifecycle moments your services can go through, to better illustrate each approach
  • 124. @svpember Event Storage Workflow Scenarios • How Does a Service access events? First, the most basic. What happens when a service wants to access events?
  • 125. @svpember Distributed (Local) 21 How do we query? the answer should be fairly obvious:
  • 127. @svpember Central Store <walk through> In this scenario, I would advocate skipping routing the requests through the message queue. The central should be fairly prominent. Direct tcp/http queries should be just fine. With the distributed scenario, services may come and go and you typically will broadcast the query to receive events, without knowing exactly who/what contains those events. With the central store, you do know, so you can typically skip around the Message Queue if you’d like
  • 128. @svpember Event Storage Workflow Scenarios • How Does a Service access events? • What happens when we bring up a new service? What happens when we bring up a new service? In this scenario: - new service - empty - needs various events from different domains in order to bring itself into alignment with the current state of the other services
  • 130. @svpember Distributed service appears. I need a,b,c!. responses
  • 132. @svpember Event Storage Workflow Scenarios • How Does a Service access events? • What happens when we bring up a new service? • What happens when a service misses or fails to process an event? what happens when a service misses -> I’d argue that it’s difficult to ‘miss’ an event if you’re using a message broker it should hold on to messages until services can read them But, perhaps there’s some catastrophic event and you lose messages on the queue.  What’s more likely is that your service won’t know how to handle what we’re really talking about here is the ability to reprocess events
  • 133. @svpember Distributed It’s basically the same as when a service comes online, although you’ll need a smaller set of data
  • 135. @svpember Event Storage Workflow Scenarios • How Does a Service access events? • What happens when we bring up a new service? • What happens when a service misses or fails to process an event? • What about out-of-order events?
  • 136. @svpember Event Storage Workflow Scenarios • How Does a Service access events? • What happens when we bring up a new service? • What happens when a service misses or fails to process an event? • What about out-of-order events? • What is the process for decommissioning a service?
  • 137. @svpember Decomission - Distributed • Are we bringing up a new service? • Are we simply killing functionality? • Don’t get rid of the events! With Distributed, you need a plan for what to do with the events you have in the system you’re shutting down. - If you’re replacing an old service with new, refactored version of it, you should be fine, so long as the new version knows how to respond to the same requests for data and to handle the same commands and events - If you’re killing the service entirely, that likely means the functionality is also going away. - Still, something needs to be responsible for the events to support old requests. - Consider offloading the events and any query mechanisms to the most closely related service
  • 138. @svpember Decommission - Centralized • … just delete the service With Centralized, the process is much easier. If you know the service isn’t need anymore… well, it’s gone. The events it was responsible for creating are still in the central store should you need them.
  • 139. Recap
  • 140. @svpember Per-Service Storage • Less infrastructure • Proper containment of events • Requires multiple event consumers and ‘rebroadcast’ mechanisms
  • 141. @svpember Decentralized Storage • Much larger footprint • Convenient • Violates ‘self-containedness’ and distribution of events • Easier for mining purposes • One rebroadcast mechanism
  • 145. @svpember Event Storming Building Blocks • Events • Reactions - “Whenever an account is created, I need to send an email” • System Commands • User-Initiated Commands • External Systems • Policy - Flow of Events and Reactions
  • 147. Conclusion: Yay, Events. I realized that this talk ends rather abruptly. So here’s a slide to pace things out. Events are great.
  • 149. @svpember Images • Ant with stick: https://www.reddit.com/r/photoshopbattles/comments/1uh80p/ perfectly_timed_photo_of_an_ant_lifting_a_stick/ • Why - man with bowtie: https://silktide.com/dear-ico-this-is-why-web-developers-hate- you/ • Why - Ryan Reynolds: http://www.reactiongifs.com/r/but-why.gif • Why - Jon Stewart: https://www.huffingtonpost.com/2013/10/25/jon-stewart- apologizes-for-us_n_4162980.html • EventStorming: https://blog.redelastic.com/corporate-arts-crafts-modelling-reactive- systems-with-event-storming-73c6236f5dd7 • Fireworks: https://commons.wikimedia.org/wiki/ File:Canada%27s_fireworks_at_the_2013_Celebration_of_Light_in_Vancouver,_BC.jpg • Mad Developers: https://www.hbo.com/last-week-tonight-with-john-oliver • Workers removing delete key: https://gcn.com/articles/2015/03/31/deleted-emails.aspx
  • 150. @svpember Further Reading • Event Versioning: https://leanpub.com/esversioning/read • Event Storming: https://blog.redelastic.com/corporate-arts-crafts- modelling-reactive-systems-with-event-storming-73c6236f5dd7