Event storage in a distributed system

I’d like to start with a story

A few months back I was at work, and received a twitter notiﬁcation

@svpember
A guy I know tweeted this at me:

@svpember
Oliver Gierke, who’s the Spring Data lead at Pivotal, sent out this tweet.

<read>

So I, uh, rather enthusiastically responded…

@svpember
by deluging the poor guy with tweets until he responded with ‘heyyy this has been helpful, thanks! *Upside-down smiley face*”

Other people jumped on the conversation too, of course, but I think I was the most. ah. enthusiastic.

@svpember
One of the other participants tweeted this: “maybe you should make a blog post?”

Which is wonderful, right. Validation from strangers on the internet!

I thought better yet, I’ll make a talk.

Event Storage in a
Distributed System
Steve Pember
CTO, ThirdChannel
steve@thirdchannel.com
Software Architecture Conf 2018: NYC
@svpember
So I wrote it up, submitted it, and here we are.

My name is Steve. I work for a company called ThirdChannel, out of Boston.

Now, I Realized that I accidentally hit the 90 minute checkbox when submitting this talk, so, ah, the scope of this a little bit more involved than the title
initially suggests.

Let’s talk about Events

It’s all about Events
This talk is all about events. You’re going to be sick of hearing the word by the end of this presentation.

However, bear with me. it is important.

I feel that systems that are not representing transitions within themselves as events, and are not actively listening to or taking advantage of these internal
events… not being quote / unquote ‘reactive’… are missing out on huge advantages in ﬂexibility, reporting, and scalability both in terms of deployments
and operational / developmental scalability.

Now, after that bold statement… well, we’ll get into all of that, but ﬁrst we should discuss the most fundamental question…

What is an Event?
What is an event?

Does anyone want to take a stab at deﬁning what we should consider an event?

@svpember
Event
• Something that has occurred in the past within some software
• Intended for consumption by other software
• Distribution is Often asynchronous
• Often contains data detailing the event
• Immutable
a Well, I would classify it as…

a piece of data that that signiﬁes some action has been performed in the past within some software

two most important bits are in the past and immutable

@svpember
So, events like “Order placed”…

Are all great. They denote that something has happened.

Immutability is another important part. Let’s say ‘ItemsShippedEvent’ was emitted with a value of 5. It would potentially be disastrous for a something to
later change that value to say, 1000, right? Would disrupt all meaning

–Martin Fowler “Domain Event”
https://martinfowler.com/eaaDev/DomainEvent.html
Things happen. Not all of them are interesting,
some may be worth recording but don’t provoke
a reaction. The most interesting ones cause a
reaction. Many systems need to react to
interesting events. Often you need to know why a
system reacts in the way it did.
Because it wouldn’t be an Architecture talk without a Fowler quote…

Another way to think of events.. which frames most of this discussion, is a quote from Martin Fowler:

don’t read!

Basically: “Important events cause reactions elsewhere in the system, and it’s often important to understand why those reactions occurred”.

As an aside… Reacting to events may be nothing new to Javascript or frontend developers

• Your browser’s DOM, Javascript, and I suppose UIs in general are full of events. Literally anything you do on the browser generates an event.
Move the mouse, click a box, type a letter, let go of a letter, etc.

• While the knowledge of this talk is transferable to the frontend to some extent, the majority of this talk is focused on the server side.

Server side doesn’t traditionally deal with lot of events, I’d say. particularly if you started your dev life and career with big frameworks like rails, grails,
django, etc

It’s not hard to do, to program in terms of events

Generally, one or more event are created when a user successfully performs an action or Command. (slide of various event names, will reuse this aesthetic
later). They represent successful deltas or actions that have occurred in the past

Now, here, we have code which accepts some incoming Command object to create a new TodoList, validates it, and generates two events based on that
command, saves the events, saves a projection of the current state of the TodoList entity (though this is optional, as I could recreate the state of the todo
list entirely from the events), and transmits them.

My domain objects or ‘Entities’, start becoming highly functional as they acquire

methods to manipulate them by applying events.

This is certainly not production code, but you can see how my entities start acquiring handlers on themselves that when provided various events know how
to use that event to update their internal state. In production, I’d probably have the entity use an internal mutable builder, and the builder receive the
events, which would then spit out a validated, immutable Entity. but Alas, this is an example.

@svpember
A slide back I mentioned that Events need to be transmitted. Well, these events need to be seen by others to be useful. It’s one thing to have my entity only
see its events, but it’s entirely another thing to share.. and mix events from across the system.

And so, Need to have some method of transmitting these events to other interested parties

Both Internal and External

For internal, this typically involves some asynchronous publish/subscribe mechanism. Tools that I’ve used successfully for this purpose have included a
library called Project Reactor, and using Reactive Streams via the rxJava library

@svpember
Externally, these events can be transmitted either via point to point http or via some asynchronous message queue… which as we’ll see later is my
preferred method.

At this point you, might be saying.. ok, cool

but why…

That’s one point of criticism I often get for my talks. I mention all these great things I’m working with but neglect to really hammer home the ‘WHY’ of it all.

So.. why? why should you care about any of this? so far it just looks like I’m adding a bunch of extra hassle for you.

The reason is that events, these smaaaaallll bits of information are collectively extremely powerful.

The point of this presentation, the synopsis, the end goal, is to try and show you that tracking events, persisting them, and treating them as ﬁrst-class
citizens within your system is a wise idea with loads of potential beneﬁt. AND that there are some caveats to be aware of when we talk about how to store
these events and work with them within a distributed environment.

However, there are some steps in the way to get there.

@svpember
Overview
• Event-Oriented Distributed System Architecture
Today, we’re going to discuss some information that I’ve broken down into the following topics:

<read> The architectural background of this talk. This is an architectural conference, and it’s important. This will cover some concepts and architectural
designs to help prepare your systems to think in terms of events.

And because we’re architects, we’ll probably have some boxes and lines drawn up on the screen, because it wouldn’t be an architectural presentation
without good ol boxes and lines

@svpember
Overview
• The Power of Events
after we get in the mindset of working with events and architecting our systems to operate in an event-ﬁrst fashion, we’ll look into why you should be
excited about having events laying around as ﬁrst-class citizens within your app. I do this topic second, as I think the architecture portion is the harder pill
to swallow… plus I think that getting your head around the existence of events makes it easier to start to see their usefulness. Although I could be wrong,
let’s give it a go.

@svpember
Overview
• Event Storage & Lifecycle
After that, we’ll discuss the impact of storing events, diﬀerent patterns for doing so, and some lifecycle concerns for services in an event based
environment

@svpember
Overview
• Event Storage & Lifecycle
• Day to Day Concerns / Working with Events
And then ﬁnally, some additional details of working with events that don’t necessarily have to do with storing them.

Alright… let’s begin

Let’s start with Microservices
First, a few minutes on Microservices

I’ve borrowed this slide before. Thanks Alvaro!

I like it because it’s honest. You start with a mess and if you’re not careful you end up with a distributed mess

Anyway…

• How many of you have attended SACon before? This is, I think, my third or fourth time over 4 years. I’m pretty sure the ﬁrst two years were
almost exclusively talks about microservices. I know I contributed to it, eh?

And there’s a good reason

This notion of Microservices has been a great transformational thing in software development and architecture. Even if you think it’s a rehash of SOA,
it still has been promoting the virtues and popularity of distributed systems with the larger community… which I think is a good thing.

• Now… Just to get a general poll… who’s working with them?

And how here likes working with them!

Any hands go down.. aw, some jaded folks

@svpember
The Promise of Microservices
• Reduced complexity per service
• Easier for developers to understand a single service
• Teams work with more autonomy
• Independent scaling, deployments, builds
• Fault isolation
• “Right tool for the job”
• Isolation and decoupling
• Continuous Delivery and Deployment
The promise of Microservices is vey alluring, yeah?

Right out of the gate, we immediately reduce the complexity of our codebase by making it several smaller codebases

My favorite: It allows teams to work in great autonomy, with improved isolation and decoupling.

It allows for independent scaling of services.

The most powerful is that micro services Allows for continuous Delivery and Continuous Deployment. Which honestly is… I think the pinnacle of eﬃciency
a software dev team should be striving for.

Now I’m being a bit ﬂippant about that because we should be concerned about testing and regressions of our releases, of course… but all of this is another
topic entirely.

@svpember
… Some Caveats
• Vastly increased infrastructure complexity
• So much Ops
• Teams need to handle all lifecycle steps of service deployment
• Conceptual difficulty with multiple service deployments
• Potential performance hits for intra-service comms
As useful and as powerful as all of that is… there are absolutely some tradeoffs when using microservices.

You immediately… IMMEDIATELY have increased complexity in your infrastructure.

And going back to my point that micro services have been good for growing awareness within the community… the rise of tooling in this space has just
been insane. Kubernetes, Hashicorp’s entire business model… it’s great stuff

The point here is that if your team isn’t ready to shift the complexity of the codebase into infrastructural and ops complexity, you should probably hold off.

A few things bothered me…
That being said, I still think that the Microservice approach is very useful.

However, as it’s been growing, three points have always bothered me that I never felt were fully discussed or I discussed agreed in the presentations I’ve
seen and material I’ve read.

Just as a preview… did everyone see the keynotes yesterday morning? Cornelia Davis started out her presentation listing issues with distributed systems
and it spoke to my soul man. She basically gave this talk already.

@svpember
Questions about Microservices
• How should they communicate?
First: how should these services communicate?

@svpember
You see, when the term ‘microservice’ became the rage, it was my observation that people were building services which utilized point to point synchronous
http comms to query, post, etc data between services. There’d be service discovery systems in play necessary to make other services aware of each
others existence.

These synchronous calls utilize resources (e.g. threads), block, take time… and if a service goes down, what happens if another service is reliant upon it?

@svpember
slide of multiple services being needed to support a single hop

And I’m aware of netﬂix’s hystrix and other circuit breaker technologies to help with all of this, but still seems a lot could go wrong in that chain.

Time to Go Reactive
To address point 1, I suggest embracing a design pattern known as ‘Reactive’

@svpember
Reactive Systems
• Communication between services driven by asynchronous events
Has anyone here heard of the ‘Reactive Manifesto’?

I’m big fan of it, but going to bit of a spin on its tenants to ﬁt my narrative here.

Anyway, when say ‘Reactive’, we don’t mean ‘React.js’ or Reactive Streams (Though I love both of those things)

It’s a design philosophy to apply to systems to help them achieve high scalability.

The ﬁrst rule for Reactive Systems is that…

On a positive note, Based on what I’ve been seeing over the past year or so, the collective opinion is moving away from entirely http to be more event
driven, which is great. There are, for example, several talks at this conference on this very subject.

@svpember
Reactive Systems
• Communication between services done by asynchronous events
• Services ‘React’ to observed events
Anyway, point two!

@svpember
Reactive Systems
• Use some Message Broker technology to promote Async and reduce
Data loss

@svpember
Reactive Systems
• Use some Message Broker technology to promote Async and reduce Data
loss
• Synchronous HTTP calls between services kept to a minimum
by reducing the # of synchronous calls, we gain two main benefits:

- less resource contention on the thread pools of each of our services

- Firewall like effect: if services die, the they don’t cause other systems to fail or have to rely on fallback circuit breaker code

- can reduce the number of calls, by collecting data from other services

One side effect of this and the previous points is that your platform should become quite fast, as well as have… <next>

@svpember
Reactive Systems
• Use some Message Broker technology to promote Async and reduce Data
loss
• Synchronous HTTP calls between services kept to a minimum
• Resiliency against failing services
- By reducing and eliminating the need to runtime dependency on other services, each service can function in isolation and will not be brought down by
failing services.

And that’s my basic overview of Reactive systems. Everyone with me so far? No. too bad, let’s keep going.

@svpember
Questions about Microservices
• How should they communicate?
• How “large” should a micro service be?
• How much responsibility should a single service have?
Now, the next two…

<read>

I feel can be solved or addressed by an architectural design pattern known as…<next>

Domain Driven Design
Command Query Responsibility
Segregation
Domain Driven Design (or DDD) and a related variant called Command Query-Responsibility Segregation (aka CQRS)

Are two architectural patterns intended for very complex systems. These are not trivial things to understand and you should take care if you plan to adopt
them. They have great power, though

- Eric Evans
“Some objects are not defined primarily by their
attributes. They represent a thread of identity
that runs through time and often across
distinct representations. Sometimes such an
object must be matched with another object
even though attributes differ. An object must be
distinguished from other objects even though
they might have the same attributes.”
One of the most interesting parts of DDD, one that really stuck with me, is this quote: <read quote>

- that’s interesting, yeah?

If I change my name, am I no longer me? Of course not, I’m defined by more than my name. If I change my email address, or my address, or my
social security number for some reason, am I no longer me? Obviously not… Can your database understand identity changes like this and still be able to
find the original object?

@svpember
• Ubiquitous Language
DDD has several interesting ideas besides that quote of course. When building a system adhering to DDD, it offers several guidelines.

The first: Ubiquitous Language.

Everyone in the company should be speaking in the same terms. The same concepts. Your classes and objects should reflect the Language.

Marketing should be using the same terms as Sales and as Engineering. When Engineering builds a new Feature or new Service the entire company should
be using it. If you’re an e-commerce app and product management decides to name the Product Catalog… ah… ‘Zephyr’ for some reason, Engineering is
also calling it Zephyr, and there better be a Zephyr.java file somewhere in the repo.

@svpember
• Entities / Value Objects

Entity Objects are those that you truly care about tracking. These objects will have speciﬁc identiﬁers and you will take great care in maintaining their
individuality and their relationship to other objects.

Value Objects: when you know you have a lot of something, but you don’t care about the identity of each individual item.

The example that I believe is given in the DDD book is that a automobile Engine might be an Entity (it has a unique serial number that mechanics might car
to track), while the wheels… well, the car has 4, and I don’t necessarily assign any uniqueness to them

@svpember
• Aggregates
Third term: Aggregates

Group of Entities With One Root
… the aggregate Root. the root acts the parent or entry point when referencing the aggregate, which leads to the next point:

@svpember
• Aggregates
• Bounded Contexts
powerful concept

logical grouping of related functionality

@svpember
The key point here is that Objects inside an Aggregate may hold references to other Roots. But only the root id. They may not hold any information about
entities below the root within that context.

Combine and Isolate related
objects into Modules
It’s a natural step then in your code to ensure that you combine and isolate all related objects within a context into one module.

No Direct comms across
boundaries
And, no direct communications are allowed across context or module boundaries.

Well if that’s true, what do I do if I need information from across boundaries?

@svpember
• Aggregates
• Domain Events

@svpember
Here we have several of our Modules - whose names mirror our Ubiquitous Language, btw. They cannot talk to each other directly, but rather through some
intermediary mechanism. Now, it could be direct message passing, but this diﬀers from importing and calling methods directly in the module.

You could use a Pub Sub Mechanism, or some sort of message broker… etc, etc.

The important bit is that the modules are bounded away from each other.

Events Are Transactionally Safe
That is, no Events are emitted until an item is successfully saved to disk. The event is part of the transaction.

@svpember
• Aggregates
• Domain Events
• One Last Takeaway…

Wait… what if I need to share
across contexts?
This may not be phrased correctly , but the answer to this is one of the most powerful aspects of DDD and one of the hardest to get used to.

Bounded contexts are isolated, autonomous components with their own entities, classes, service objects, etc…. However, all the bounded contexts exist
within the same system, and certain concepts or Entities will likely exist throughout the entirety of a system… although each context may only care about a
subset of the info about that entity.

Or, another way to phrase it: each bounded context is only concerned with some subset of an Entity within a system, and no context will know the entire
set of information about an entity. This separation is the concern of the context’s boundary

@svpember
- For example!

The catalog context knows how much inventory is left for this particular SKU, but the shopping cart and the admin context don’t necessarily need the
information. The inventory count may be a function of a ‘Warehousing context’ that the catalog receives events for.

Similarly, the Shopping Cart context contains the quantity, the # of this SKU that the user wants to purchase. That information has no bearing on the
catalog context.

This concept - that an entity can exist in multiple contexts, though each context is only concerned with a subset of that information - is very powerful, and
very useful.

Understanding which information belongs in which context… and maintaining that decoupling, is however, one of the toughest aspects of DDD, and can be
a challenge for younger developers or those newer to ddd to grasp.

For example, we recently had a situation…

And now, CQRS
Command Query Responsibility Segregation is an evolution oﬀ of DDD, that calls for changes in how one accepts and sends data

MVC
With your standard MVC/ CRUD style approach that you get out of the box with many big frameworks, the pattern generally is something like the following:

- user makes a change via the ui, let’s say to a Product object

- There’ll likely be a Product Controller which takes that data

- passes it to a Product Model

- which in turn gets validated and saved to the database, likely in a table named ‘product’

When the user wants to retrieve information about the product, the same objects are used. Query for product with id X, product controller uses a product
model to retrieve data from the product row, then passes the retrieved model up to the ui

CQRS
CQRS says: why use the same objects for every task? It makes a hard distinction between modifying actions the user is attempting to do - which it calls
‘Commands’ - and data retrieval tasks - aka Queries - , and thus breaks up the underlying code to enforce that distinction.

So, If a user wants to make a change to some data, say again, a Product, he manipulates the change in the ui, maybe clicks a button, and a relevant
controller converts that request into a ProductChangeCommand object or model, which contains details on what the user is trying to change. That
command is then validated, then the changes are persisted AND, while not shown here, domain events are emitted.

As for Querying, a User dictates the query, it’s packaged by the query controller into a query object, results are pulled from the database, and the response
is returned to the user.

While it reads similar it’s important to understand that the models are diﬀerent, and often advantageously so. The Product information I display to the end
user may be only a subset of the Product model / entity object, so I only pull a few ﬁelds from the db. Or, my query object represents a composite of
several models in one multi faceted report that is pulled in one query.

CQRS
Following that line of thinking, we can extends this a bit further.

We can isolate our writes and our reads into separate contexts or services. Besides the nice decoupling, it allows us to get creative in other areas:

- want to scale out our write capability vs our reads? no problem. just scale up one of those services

- if our write service emits domain events when it saves, could our query service listen for other domains’ events’? sure!

- want to have custom query reports that pull from multiple domains? no problem! You can build multiple query models that are highly targeted towards
what ever end user report or experience you’re trying to deliver.

Allows for interesting
Architectures
Continuing that line of thinking, it allows for some very interesting Architectures

This a graphic taken from Udi Dahan’s website, he’s another pioneer in the CQRS space.

What this diagram is trying to depict similar to what I’ve been describing.

The blocks labelled ‘AC’ stand for autonomous component, I think. Think of them like a service.

So, in the bottom left, the user enters a command to the ﬁrst service. it succeeds and the changes are written to local storage. Events are published and
retrieved by one or many other services, who update their local query caches based on those events. Then, when the user performs a query, it hits that
highly targeted query cache, giving the user the intended results with a minimum of sql queries or joins.

Ok then, still seems like a lot of trouble… so why.. why go through all this trouble?

@svpember
Reactive + CQRS
• System is ideal for both write- and read-heavy applications
• writes are contained for a particular service / bounded context

• scale up services receiving writes

• create query caches speciﬁcally designed to handle queries

• can scale those up too

• Eﬃcient Querying -> just going to highlight this again

• One note: due to the distributed - and as we’ll see soon event based storage, if your company has analysts they will likely hate you. You’ll likely
need to build a service or query store solely for them to run sql queries against.

@svpember
Reactive + CQRS
• Service Design via Bounded Contexts

Bounded Contexts are an excellent tool to determine Microservice responsibility and potentially sizing of a service. i.e. how big should it be? when do we
create a new one?

We went through a team exercise where we tried to figure this out…

- Each circle represents a context boundary

- big outer circle is 3c itself

- four big inner circles are different functional areas of our company

- small inner inner circles represent contexts further still

What we found was that our services for the most part mapped to the smaller circles, which was great. although there was much duplication (e.g. services
belonging to multiple contexts) and some we identified that should be combined. those are both bad.

@svpember
Reactive + CQRS
• Failure Protection

@svpember
• For a sufficiently large system, something is always going to be in a failure state

• reducing or eliminating calls between services when handling a Command or a Query eliminates the dependency on that or those additional
services

• Service being queried will still function

• One of the tenants of the ‘Reactive Manifesto’, provides a stopgap for failures impacting the user.

• For example, if the Shopping Cart Management service is down, my product catalog service should not be affected and the user should still be
able to browse the catalog

• Additionally, using a durable Message Broker for communication grants additional layers of protection. We use RabbitMq, but loads of folks
have great success with tools like Kafka. These tools will hold on to messages, allowing consuming services to consume them at their leisure. This has
advantages in situations were a service is down for a period of time. Or, imagine if a service cannot handle a message the devs need to fix it, the message
waits on the queue until the service is back online resulting in no data loss.

image - broken down service, happy product catalog

@svpember
Reactive + CQRS
• Promotes Async comms
• eliminating or reducing calls between services when handling Commands and Queries also eliminates blocking, synchronous calls to these
services

• reduces resource contention on thread pools

• Also eliminates a potential failure vector: services ‘backing up’ with chains of communications

@svpember
Reactive + CQRS
• “Simple” Testing
• I put ‘simple’ in quotes there because no matter what I say next… this is still a distributed system we’re talking about

- Each service can be heavily unit and integration tested in isolation of each other

• Testing of the platform as a whole is important and useful. At least for us, a good chunk if not most of our bugs is contract violations in the
incoming commands and events (e.g. I thought event A was emitting two ﬁelds but it’s actually three and I didn’t have JSON ignore properties set in
Jackson, or a service misspelled a variable name, or someone changed event A’s ﬁelds.)

• To me, this means that our organization is lacking communication and is violating the CAP theorem, or it’s being proven true, or however you
want to say it. The point here is to make sure that other teams are aware of the shape of events emitted from your services.

@svpember
Reactive + CQRS
• Reduces cross-service Querying
One of the most important reason you do something like this? To avoid having to write queries that would otherwise span multiple services. Has anyone
had to write queries that span multiple services? It’s a nightmare. so ineﬃcient.

So, while this approach may feel like you’re duplicating data or whatnot, it’s quite eﬃcient

@svpember
Reactive + CQRS
• Reduces cross-service Querying
• Also…
And it also leads to one of my absolute favorite concepts…

@svpember
Event Sourcing
• Alternate Storage Pattern -> Prevents Data Loss

@svpember
More speciﬁcally, it’s an alternative to your standard ORM storage mapping,

Where an object in memory maps directly to a row in a database, even if that row may be split via joins

* update is made to a model, updates a column in your database

* in this method, the only thing you know about your data is what it looks like right now.

@svpember
Event Sourcing
• Store Deltas

Store Deltas, not Current State

@svpember
This stream of events are persisted in our database in the order they occurred, as a journal of what has transpired.

These events can then be played back against our Domain Object, building it up to the state it would be at any given point in time, although this is most
likely the Current State

@svpember
Event Sourcing
• Store Deltas
• Additive Only
• Time Series

Never Delete, Never Update
It means you only ever insert data into your database. No event rows are ever, ever, ever updated or deleted. In so doing you’ve now turned your events
table into an append only journal, which is every eﬃcient for most databases to do.

In other words, events are immutable!

@svpember
Event Sourcing
• Store Deltas
• Additive Only
• Time Series
• It’s Old
Rather, this basic idea of storing deltas, rather than just current state has been around for a long time

- Every transaction you make with your bank. Every Credit or Debit made is logged, along with an audit trail of who (e.g. which teller) made the change.

- To get your balance, your bank simply adds up each of these transactions

- May also periodically record what the balance was at speciﬁc points in time, to prevent having to recalculate everything from the beginning of time.

- Can you imagine if you checked your bank statement and it could only show you your current balance… not how you reached that number?

Lawyers!

If a contract needs to be adjusted, is the contract thrown out and re-written? No. Rather, ‘addendums’ are placed on the contract.

To ﬁgure out the contract actually involves, one has to read the initial contract, and then each successful addendum in order to ﬁgure out what the thing
actually says.

@svpember
Event Sourcing
• Store Deltas
• Additive Only
• Time Series
• It’s Old
• Easy To Implement

promotes a highly functional style, that’s very easy to unit test

if I need to get the current state of an entity, it’s as simple as select * from event where id = x, then pass each event into an entity class, which will build it
up to current state by using the functions outlined here

@svpember
Event Sourcing
• Store Deltas
• Additive Only
• Time Series
• It’s Old
• Easy To Implement
• … Diﬃcult to Grasp

Now, this is where ES may start to hurt your brain

All Entities are Transient
Derivatives of the Event Stream

Objects are backed - ‘sourced’ -
by events
Which is just a fancy way of saying: All Objects are ‘backed’ or ‘sourced’ by various events from the Journal or Event Stream

@svpember
Now this has lots of powerful uses which we’ll get into in a bit, but regardless…

Can be
difficult for
Junior
Engineers
I’ve found that this entire concept can be a bit difficult for junior developers to grasp at first. So be aware of that. Internal Education can help dramatically.

But, why?
I’m sure that I’m making this all sound very attractive. You’re again probably asking yourself… ok, great… but why?

Ok, so why?

@svpember
Event Sourcing: Why
• Append-Only

@svpember
Event Sourcing: Why
• Append-Only
• Prevents Data Loss

Never Delete!
With Event Sourcing, no events are EVER deleted or updated, a nice side eﬀect of the Append only nature

@svpember
Event Sourcing: Why
• Append-Only
• Time Travel

@svpember
Event Sourcing: Why
• Append-Only
• Time Travel
• Perfect For Time Series

@svpember
Event Sourcing: Why
• Append-Only
• Time Travel
• Automatic Audit Log ++
Built in, Automatic Audit Log for your Entities

Audit Logs tell the History
Events tell the Intent of History

Furthermore, Having the Events as a ﬁrst-order member of your platform can give you enhanced information around what your users or systems are doing
beyond what might normally get written to the database.

Can make events that don’t necessarily deal with the properties changed by a user, but additional actions that may have occurred

And it’s easier to work with and analyze the data if the events are integrated within your platform already.

@svpember
One trivial example is this.

One of ﬁrst ES systems was an Internal User Management system, where our Program Managers (don’t worry about these terms) track prospective
ThirdChannel employees, which we call agents.

Our Managers wanted a way to get history for each potential agent, and because of Event Sourcing, it was about 5 minutes of work to display the history of
each agent like that.

@svpember
Event Sourcing: Why
• Append-Only
• Time Travel
• Automatic Audit Log ++
• Data Mining and Reporting

@svpember
And so you may be thinking, ok great: I can get all the relevant events for Bob’s shopping cart, but I’m just only ever going to play them all to get his current
state!

And for much of the time, yes, users deﬁnitely want to know what their current state is.

The magic comes though with the business value. Business loves time series data

@svpember
Reporting in Event Temporality
• Look at ALL users Product Removed events: which products are being
discarded?
• Find ALL ProductAdded + ProductRemoved event pairings (i.e. the same
product, user) that occur with 5 minutes: perhaps a user is hesitant on
purchasing… maybe oﬀer them a discount?
• Find avg time between ProductAdded and OrderPlaced: how long are users
sitting with non-empty carts?
• Find Trends in all of the above
Because you have access to the all data changes that have happened along with the time relationship… you can get extremely creative..

Now, I’m very creative, but here’s a few ideas for reports on just users’ shopping cart events…

Business
Types love
Reports
I assure you, if you start showing oﬀ or even hinting to your product research teams, your product owner teams, etc that these capabilities could exist
within your platform… they will get very excited

Collecting and applying your events like this is known as a ‘projection’. You’re taking events from your various event streams and projecting them into or
onto some data structure.

You can also think of it as a Materialized View.

maybe just an image and talk about updating projections

Is there a performance hit?
Save ALL the events?
So, that all being said..

Two questions I generally receive when talking about this…<read>

The answer to the ﬁrst is yes, there is a little performance hit if you play events each time during a user request. However, we’ve found that it’s fairly
minimal for entities with fewer than a hundred or so events on them. It’s one SQL query to get all events for an entity to see current state, just as it would be
with an ORM. However, we typically tend to follow the CQRS methodology and build current state projections for run time querying that users see.

Now, for the answer to the second, let’s jump to the next section….

Questions?
Questions before I proceed?

Event Storage
Ok, With that all in mind, let’s proceed to the next section, Event Storage… let’s talk about how we store all these events.

First, I want to address the two questions I left on.

First, in an event oriented world, you’re going to acquire quite a few events. Going to seem pretty messy after a while. You’ll end up with perhaps larger
database sizes than you originally thought, though, remember this is largely due to the fact that you’re now tracking a third dimension of time within your
data store

And really… data storage is CHEAP. most of us are not netﬂix or facebook and the scale of events we’ll be working with is very manageable.

Now, of course, if this is at all bothersome, you can adopt a compaction strategy. The most well known is SnapShotting, where you compress related
events older than say, two years into one object, then extract the raw events and put them into a cheaper long term storage like AWS glacier. Still never
throw them away, though.

@svpember
You can also use Snapshotting more frequently as a mechanism to alleviate performance troubles, too.

Make a snapshot on some interval, say every week.. or every 100 events. You load the snapshot ﬁrst, then ﬁnd all events since the snapshot was taken.
The issue here is that this adds an additional query to fetch the snapshot, so it’s only worthwhile if the number of events you have to process takes longer
than additional database query. If your events are pure functions, it will take a fairly high number to be worse than the db query.

Event Schema
Anyway, let’s discuss what an event looks like on disc.

And by that, I mean… physically, what does our database schema look like?

With event sourcing, there’s no real ‘correct’ way of doing things. You can use any type of data storage to store your events, although we at third
thirdchannel prefer postgres and have had some experience with cassandra

Anyway. I think there’s generally two approaches to what an event looks like on disc.

It’s either:

@svpember
Table Per Event:

speciﬁc schema for the properties of an event

list fields

An event, at minimum must have some identifier (we use uuids), a link to some entity id that this event belongs to, the revision number of the event - this
helps keep events ordered and helps guards against things like optimistic locking -, the user id of the user

old price, new price, currency are all specific to this event

@svpember
Or, One event Table. In this scenario, as you may imagine, ALL your events are in one table. It’s easy to say, shard this table by date or something, but
basically, one event table.

Here, our events are essentially Schema-less. Or rather, there is an implicit schema. As your events are parsed or deserialized from disc, the application

example db schema

does anyone here not know about the jsonb datatype?

Seriously, write this down. go look at this thing. switch to using Postgres entirely for it.

It’s a better document store than mongoDB. I don’t like MongoDB very much, although admittedly I haven’t given it a fair shake these past few years.

True story: Monday a friend of mine

@svpember
Recommend: One Event Table
• No Migrations, past the ﬁrst (+)
• Trivial to Add New Events (+)
• Selecting Multiple Event Types for a single Entity in one Query, no joins (++)
• ProTip: Use Postgres and the jsonb data type (+)
• Querying across multiple event types with no joins (+)
• Zero to Minimal Database Constraints (-)
My recommendation is One event table…

+ and -

Data Locality & Service
Lifecycle
This title may be strange but bear with me.

So, now that we’ve chosen the schema for events… the question still stands: how does this change in light of a distributed environment? If I’ve been
arguing that we can have multiple data stores within our system, and each of services are generating events… well, where should these events physically
live?

@svpember
It’s a bit of a spectrum

given the assumption that each service has a data store, I think there’s two basic ‘pure’ strategies for event storage.

Service - Local Storage
On one end, each service is responsible for storing a certain set of domain events. Basically, anything that a service emits it should also store. This of
course requires that each service have its own datastore and generally will operate in a way you may be accustomed to. In addition to the events, you’ll
likely have models or materialized views representing the current state that are updated by the events

Central Store
• On the other end is the central event store. It has some mechanism to listen to all events that are emitted within your system and then save
them to one general data storage. Conceptually it’s one service that writes to that store (single writer), but can handle read requests from other services
(single writer, multiple reader).

◦ What’s interesting architectural approach with this pattern is that it opens up your services to not need a local data store… they would be
entirely event driven and hold their entire state in local cache. It may sounds crazy, but is entirely feasible in this structure

Anyway. which approach to pick?… I don’t think there’s a right answer, it’s really what’s comfortable for you and your team. What can help, though, is
looking at diﬀerent lifecycle moments your services can go through, to better illustrate each approach

@svpember
Event Storage Workﬂow Scenarios
• How Does a Service access events?
First, the most basic. What happens when a service wants to access events?

@svpember
Distributed (Local)
21
How do we query? the answer should be fairly obvious:

@svpember
Distributed (Remote)
1 2
3 4
5
6

@svpember
Central Store
<walk through>

In this scenario, I would advocate skipping routing the requests through the message queue. The central should be fairly prominent. Direct tcp/http queries
should be just ﬁne.

With the distributed scenario, services may come and go and you typically will broadcast the query to receive events, without knowing exactly who/what
contains those events. With the central store, you do know, so you can typically skip around the Message Queue if you’d like

@svpember
• What happens when we bring up a new service?
What happens when we bring up a new service?

In this scenario:

- new service

- empty

- needs various events from diﬀerent domains in order to bring itself into alignment with the current state of the other services

@svpember
Distributed
service appears. I need a,b,c!. responses

@svpember
Central Store
service appears. queries central store

@svpember
• What happens when a service misses or fails to process an event?
what happens when a service misses ->

I’d argue that it’s diﬃcult to ‘miss’ an event

if you’re using a message broker it should hold on to messages until services can read them

But, perhaps there’s some catastrophic event and you lose messages on the queue.

What’s more likely is that your service won’t know how to handle

what we’re really talking about here is the ability to reprocess events

@svpember
Distributed
It’s basically the same as when a service comes online, although you’ll need a smaller set of data

@svpember
• What about out-of-order events?

@svpember
• What about out-of-order events?
• What is the process for decommissioning a service?

@svpember
Decomission - Distributed
• Are we bringing up a new service?
• Are we simply killing functionality?
• Don’t get rid of the events!
With Distributed, you need a plan for what to do with the events you have in the system you’re shutting down.

- If you’re replacing an old service with new, refactored version of it, you should be ﬁne, so long as the new version knows how to respond to the same
requests for data and to handle the same commands and events

- If you’re killing the service entirely, that likely means the functionality is also going away.

- Still, something needs to be responsible for the events to support old requests.

- Consider oﬄoading the events and any query mechanisms to the most closely related service

@svpember
Decommission - Centralized
• … just delete the service
With Centralized, the process is much easier. If you know the service isn’t need anymore… well, it’s gone. The events it was responsible for creating are still
in the central store should you need them.

@svpember
Per-Service Storage
• Less infrastructure
• Proper containment of events
• Requires multiple event consumers and ‘rebroadcast’ mechanisms

@svpember
Decentralized Storage
• Much larger footprint
• Convenient
• Violates ‘self-containedness’ and distribution of events
• Easier for mining purposes
• One rebroadcast mechanism

@svpember
Event Storming Building Blocks
• Events
• Reactions - “Whenever an account is created, I need to send an email”
• System Commands
• User-Initiated Commands
• External Systems
• Policy - Flow of Events and Reactions

Conclusion: Yay, Events.
I realized that this talk ends rather abruptly. So here’s a slide to pace things out.

Events are great.

Thank You!
@svpember
spember@gmail.com
@svpember

@svpember
Images
• Ant with stick: https://www.reddit.com/r/photoshopbattles/comments/1uh80p/
perfectly_timed_photo_of_an_ant_lifting_a_stick/
• Why - man with bowtie: https://silktide.com/dear-ico-this-is-why-web-developers-hate-
you/
• Why - Ryan Reynolds: http://www.reactiongifs.com/r/but-why.gif
• Why - Jon Stewart: https://www.huﬃngtonpost.com/2013/10/25/jon-stewart-
apologizes-for-us_n_4162980.html
• EventStorming: https://blog.redelastic.com/corporate-arts-crafts-modelling-reactive-
systems-with-event-storming-73c6236f5dd7
• Fireworks: https://commons.wikimedia.org/wiki/
File:Canada%27s_ﬁreworks_at_the_2013_Celebration_of_Light_in_Vancouver,_BC.jpg
• Mad Developers: https://www.hbo.com/last-week-tonight-with-john-oliver
• Workers removing delete key: https://gcn.com/articles/2015/03/31/deleted-emails.aspx

@svpember
Further Reading
• Event Versioning: https://leanpub.com/esversioning/read
• Event Storming: https://blog.redelastic.com/corporate-arts-crafts-
modelling-reactive-systems-with-event-storming-73c6236f5dd7

Event storage in a distributed system

Recomendados

Recomendados

Más contenido relacionado

Similar a Event storage in a distributed system

Similar a Event storage in a distributed system (20)

Más de Steve Pember

Más de Steve Pember (20)

Último

Último (20)

Event storage in a distributed system