Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
CQRS and Event Sourcing
with MongoDB and PHP
About me
Davide Bellettini
● Developer at Onebip
● TDD addicted
@SbiellONE — about.bellettini.me
What is this talk about
A little bit of context
About Onebip
Mobile payment platform.
Start-up born in 2005,
acquired by Neomobile
group in 2011.
Onebip today:
- 70 count...
LAMP stack
It all started with a Monolith
self-contained services communicating via REST
To a distributed system
First class modern NoSQL distributed dbs
Modern services
But the Monolith is still there
The problem
A reporting horror story
We need three new reports!
― Manager
Sure, no problem!
Deal with the legacy SQL schema
Deal with MongoDB
A little bit of queries here,
a little bit of map-reduce there
1 month later...
Reports are finally ready!
until...
Your queries are killing production!
― SysAdmin
Still not enough!
Heavy query optimization,
adding indexes
Let’s reuse data from other reports
(don’t do that)
DB is ok, reports delivered.
but then...
Houston, we have a problem. Reports are
not consistent (with other reports)
― Business guy
Mistakes
were
made
Lessons
learned
It’s hard to compare different data in a distributed
system splitted across multiple domains
#1
Avoid multiple sources of ...
Same words, different concepts across domains
#2
Ubiquitous language
Changing a report shouldn’t have side effects
#3
Fault tolerance to change
Most common solutions
#1
ETL + Map-Reduce
#2
Data Warehouse + Consultants
#3
Mad science (Yeppa!)
What we wanted
No downtime in production
Consistent across domains
Must have
A system elastic enough to extract any metric
Real time data
Nice to have
In DDD we found the light
CQRS and Event Sourcing
Command-query
responsibility segregation
(CQRS)
Commands
Anything that happens in one of your domains
is triggered by a command and generates one
or more events.
Order re...
Query
Generate read models from events depending
how data need to be actually used (by users
and other application interna...
Event Sourcing
The fundamental idea of Event Sourcing is that of ensuring
every change to the state of an application is c...
Starting from the beginning of time, you are
literally unrolling history to reach state in a
given time
Unrolling a stream...
Idea #1
Every change to the state of your application is
captured in event object.
“UserLoggedIn”, “PaymentSent”, “UserLan...
Idea #2
Events are stored in
the sequence they
were applied inside
an event store
Idea #3
Everything is an event. No more state.
Idea #4
One way to store data/events but potentially
infinite ways to read them.
A practical example
Tech ops, business co...
Healthy NoSQL
You start with this
{
"_id": ObjectId("123"),
"username": "Flash",
"city": …,
"phone": …,
"email": …,
}
The more successful your company
is, the more people
…
The more people, the more views
With documental dbs it's magically easy to add new
fields to your collections.
Soon you might end up with
{
"_id": ObjectId("123"),
"username": "Flash",
"city": …,
"phone": …,
"email": …,
"created_at":...
A bomb waiting to detonate
It’s impossible to keep adding state changes to your
documents and then expect to be able to ex...
Exploring
Tools
Event Store
● Engineered for event sourcing
● Supports projections
● By the father of CQRS (Greg Young)
● Great performanc...
LevelWHEN
An event store built with Node.js and LevelDB
● Faster than light
● Completely custom, no tools to handle
aggreg...
The known path
● PHP (any other language
would just do fine)
● MongoDB 2.2.x
Why MongoDB
Events are not relational
Scales well
Awesome aggregation framework
Hands on
Storing Events
Service |
 |
 [event payload] |
 |
Service --- Queue System <------------> API -> MongoDB
/ |
/ [event payload] |
/ |
Serv...
Queues
Recruiter - https://github.com/gabrielelana/recruiter
MongoDB replica set
A MongoDB replica set with two logical dbs:
1. Event store where we would store events
2. Reporting DB...
Anatomy of an event 1/2
{
'_id' : '3318c11e-fe60-4c80-a2b2-7add681492d9',
'type': 'an-event-type',
'data': {
'meta' : {
…
...
Anatomy of an event 2/2
'meta' : {
'creation_date': ISODate("2014-21-11T00:00:01Z"),
'saved_date': ISODate("2014-21-11T00:...
Don’t trust the network: Idempotence
{
'_id' : '3318c11e-fe60-4c80-a2b2-7add681492d9',
…
}
The _id field is actually defin...
Indexes
● Events collection is huge (~100*N documents)
● Use indexes wisely as they are necessary yet
expensive
● With sug...
Benchmarking
How many events/second can you store?
Our machines were able to store roughly 150 events/sec.
This number can...
Final tips
● Use SSD on your storage machines
● Pay attention to write concerns (w=majority)
● Test your replica set fault...
From events
to meaningful metrics
Sequential Projector -> Event Mapper -> Projection -> Aggregation
The event processing pipeline
A real life problem
What is the conversion rate of our registered
users?
#1 The registration event
{
'_id' : '3318c11e-fe60-4c80-a2b2-7add681492d9',
'type': 'user-registered',
'data': {
'meta' : ...
#2 The purchase event
{
'_id' : '3318c11e-fe60-4c80-a2b2-7add681492d9',
'type': 'user-purchased',
'data': {
'meta' : {
'sa...
Sequential projector 1/2
[]->[x]->[]->[x]->[]->[]->[]->[]
|--------------| |------------|
|
|
|
|
---> Projector
Divides t...
Sequential projector 2/2
● It’s a good idea to select fixed sizes batches to avoid
memory problems when you load your Curs...
Event mapper 1/3
Translates event fields to the Read Model domain
Takes an event as input, applies a bunch of logic and wi...
Event mapper 2/3
Input event:
user-registered
Output:
$output = [
'user_id' => 123, // simply copied
'user_name' => 'flash...
Event mapper 3/3
Input event:
user-purchased
Output:
$output = [
'user_id' => 123, // simply copied
'email' => 'a-dummy-em...
Projection
Essentially it is your read model.
The data that the business is interested in.
The Projection after event #1
db.users_conversion_rate_projection.findOne()
{
'user_id': 123,
'user_name': 'flash',
'email...
The Projection after event #2
{
'user_id': 123,
'user_name': 'flash',
'email': 'a-dummy-email@gmail.com',
'registered_at':...
The Projection collection{
'user_id': 123,
'user_name': 'flash',
'email': 'a-dummy-email@gmail.com',
'registered_at': "201...
The Projection - A few thoughts
Note that we didn't copy from events to projection
all the available fields. Just relevant...
From these two events we could have
generated infinite read models such as
● List all purchased products and related amoun...
One way to write,
infinite ways to read!
The aggregation (1) - Total registered users
var registered = db.users_conversion_rate_projection.aggregate([
{
$match: {
...
The aggregation (2) - User with a purchase
var purchased = db.users_conversion_rate_projection.aggregate([
{
$match: {
"re...
The aggregation (3) - Automate all the things
● You can easily create the aggregation framework statement
by composition a...
Another events usage:
Business & Tech Monitoring
Beware of the beast!
No Silver Bullet
Events are expensive
They require a lot of TIME to be parsed
Events are expensive
You will end up with this billion size collection
(and counting)
Fixing wrong events is painful
Events are complex
Moving around events is
horribly painful
Actually it will make your life incredibly
difficult with hidden bugs and leaking
documentation.
Mongo won’t help you
Improvements
● Upgrade from MongoDB 2.2.x to 3.0.x
● Switch to WiredTiger storage engine to save
space
Credits
Based on a talk by Jacopo Nardiello
● Slides: http://bit.ly/es-nardiello-2014
● Video: https://vimeo.com/113370688
Q&A
@SbiellONE — about.bellettini.me
Thank you!
CQRS and Event Sourcing with MongoDB and PHP
CQRS and Event Sourcing with MongoDB and PHP
CQRS and Event Sourcing with MongoDB and PHP
Próxima SlideShare
Cargando en…5
×

CQRS and Event Sourcing with MongoDB and PHP

9.708 visualizaciones

Publicado el

In Onebip we developed a reporting system based on CQRS (Command Query Responsibility Segregation) and Event Sourcing using MongoDB.

In this talk I will introduce CQRS and Event Sourcing concepts, I will talk about our path and technical and conceptual challenges we faced, the strenght of our solution and the parts where there's room for improvement.

Publicado en: Software

CQRS and Event Sourcing with MongoDB and PHP

  1. 1. CQRS and Event Sourcing with MongoDB and PHP
  2. 2. About me Davide Bellettini ● Developer at Onebip ● TDD addicted @SbiellONE — about.bellettini.me
  3. 3. What is this talk about
  4. 4. A little bit of context
  5. 5. About Onebip Mobile payment platform. Start-up born in 2005, acquired by Neomobile group in 2011. Onebip today: - 70 countries - 200+ carriers - 5 billions potential users
  6. 6. LAMP stack It all started with a Monolith
  7. 7. self-contained services communicating via REST To a distributed system
  8. 8. First class modern NoSQL distributed dbs Modern services
  9. 9. But the Monolith is still there
  10. 10. The problem A reporting horror story
  11. 11. We need three new reports! ― Manager Sure, no problem!
  12. 12. Deal with the legacy SQL schema
  13. 13. Deal with MongoDB
  14. 14. A little bit of queries here, a little bit of map-reduce there
  15. 15. 1 month later...
  16. 16. Reports are finally ready!
  17. 17. until...
  18. 18. Your queries are killing production! ― SysAdmin
  19. 19. Still not enough! Heavy query optimization, adding indexes
  20. 20. Let’s reuse data from other reports (don’t do that)
  21. 21. DB is ok, reports delivered.
  22. 22. but then...
  23. 23. Houston, we have a problem. Reports are not consistent (with other reports) ― Business guy
  24. 24. Mistakes were made
  25. 25. Lessons learned
  26. 26. It’s hard to compare different data in a distributed system splitted across multiple domains #1 Avoid multiple sources of truth
  27. 27. Same words, different concepts across domains #2 Ubiquitous language
  28. 28. Changing a report shouldn’t have side effects #3 Fault tolerance to change
  29. 29. Most common solutions
  30. 30. #1 ETL + Map-Reduce
  31. 31. #2 Data Warehouse + Consultants
  32. 32. #3 Mad science (Yeppa!)
  33. 33. What we wanted
  34. 34. No downtime in production Consistent across domains Must have
  35. 35. A system elastic enough to extract any metric Real time data Nice to have
  36. 36. In DDD we found the light
  37. 37. CQRS and Event Sourcing
  38. 38. Command-query responsibility segregation (CQRS)
  39. 39. Commands Anything that happens in one of your domains is triggered by a command and generates one or more events. Order received -> payment sent -> Items queued -> Confirmation email sent
  40. 40. Query Generate read models from events depending how data need to be actually used (by users and other application internals)
  41. 41. Event Sourcing The fundamental idea of Event Sourcing is that of ensuring every change to the state of an application is captured in an event object, and that these event objects are themselves stored in the sequence they were applied. ― Martin Fowler
  42. 42. Starting from the beginning of time, you are literally unrolling history to reach state in a given time Unrolling a stream of events
  43. 43. Idea #1 Every change to the state of your application is captured in event object. “UserLoggedIn”, “PaymentSent”, “UserLanded”
  44. 44. Idea #2 Events are stored in the sequence they were applied inside an event store
  45. 45. Idea #3 Everything is an event. No more state.
  46. 46. Idea #4 One way to store data/events but potentially infinite ways to read them. A practical example Tech ops, business control, monitoring, accounting they all are interested in reading data from different views.
  47. 47. Healthy NoSQL
  48. 48. You start with this { "_id": ObjectId("123"), "username": "Flash", "city": …, "phone": …, "email": …, }
  49. 49. The more successful your company is, the more people … The more people, the more views
  50. 50. With documental dbs it's magically easy to add new fields to your collections.
  51. 51. Soon you might end up with { "_id": ObjectId("123"), "username": "Flash", "city": …, "phone": …, "email": …, "created_at": …, "updated_at": …, "ever_tried_to_purchase_something": …, "canceled_at": …, "acquisition_channel": …, "terminated_at": …, "latest_purchase_date": …, … }
  52. 52. A bomb waiting to detonate It’s impossible to keep adding state changes to your documents and then expect to be able to extract them with a single query.
  53. 53. Exploring Tools
  54. 54. Event Store ● Engineered for event sourcing ● Supports projections ● By the father of CQRS (Greg Young) ● Great performances http://geteventstore.com/ The bad Based on Mono, still too unstable.
  55. 55. LevelWHEN An event store built with Node.js and LevelDB ● Faster than light ● Completely custom, no tools to handle aggregates https://github.com/gabrielelana/levelWHEN
  56. 56. The known path ● PHP (any other language would just do fine) ● MongoDB 2.2.x
  57. 57. Why MongoDB Events are not relational Scales well Awesome aggregation framework
  58. 58. Hands on
  59. 59. Storing Events
  60. 60. Service | | [event payload] | | Service --- Queue System <------------> API -> MongoDB / | / [event payload] | / | Service | The write architecture
  61. 61. Queues Recruiter - https://github.com/gabrielelana/recruiter
  62. 62. MongoDB replica set A MongoDB replica set with two logical dbs: 1. Event store where we would store events 2. Reporting DB where we would store aggregates and final reports
  63. 63. Anatomy of an event 1/2 { '_id' : '3318c11e-fe60-4c80-a2b2-7add681492d9', 'type': 'an-event-type', 'data': { 'meta' : { … }, 'payload' : { … } } }
  64. 64. Anatomy of an event 2/2 'meta' : { 'creation_date': ISODate("2014-21-11T00:00:01Z"), 'saved_date': ISODate("2014-21-11T00:00:02Z"), 'source': 'some-bounded-context', 'correlation_id': 'a-correlation-id' }, 'payload' : { 'user_id': '1234', 'animal': 'unicorn', 'colour': 'pink', 'purchase_date': ISODate("2014-21-11T00:00:00Z"), 'price': '20/fantaeuros' }
  65. 65. Don’t trust the network: Idempotence { '_id' : '3318c11e-fe60-4c80-a2b2-7add681492d9', … } The _id field is actually defined client side and ensures idempotence if an event is received two times
  66. 66. Indexes ● Events collection is huge (~100*N documents) ● Use indexes wisely as they are necessary yet expensive ● With suggested event structure: {‘data.meta.created_at’: 1, type:1}
  67. 67. Benchmarking How many events/second can you store? Our machines were able to store roughly 150 events/sec. This number can be greatly increased with dedicated IOPS, more aggressive inserting policies, etc...
  68. 68. Final tips ● Use SSD on your storage machines ● Pay attention to write concerns (w=majority) ● Test your replica set fault tolerance
  69. 69. From events to meaningful metrics
  70. 70. Sequential Projector -> Event Mapper -> Projection -> Aggregation The event processing pipeline
  71. 71. A real life problem What is the conversion rate of our registered users?
  72. 72. #1 The registration event { '_id' : '3318c11e-fe60-4c80-a2b2-7add681492d9', 'type': 'user-registered', 'data': { 'meta' : { 'save_date': ISODate("2014-21-11T00:00:09Z"), 'created_at': ISODate("2014-21-11T00:00:01Z"), 'source': 'core-domain', 'correlation_id': 'user-123456' }, 'payload' : { 'user_id': 123, 'username': 'flash', 'email': 'a-dummy-email@gmail.com', 'country': 'IT' } }
  73. 73. #2 The purchase event { '_id' : '3318c11e-fe60-4c80-a2b2-7add681492d9', 'type': 'user-purchased', 'data': { 'meta' : { 'save_date': ISODate("2014-21-11T00:10:09Z"), 'created_at': ISODate("2014-21-11T00:10:01Z"), 'source': 'payment-gateway', 'correlation_id': 'user-123456' }, 'payload' : { 'user_id': 123, 'email': 'a-dummy-email@gmail.com', 'amount': 20, 'value': EUR, 'payment': 'credit_card', 'item': 'fluffy cat' } }
  74. 74. Sequential projector 1/2 []->[x]->[]->[x]->[]->[]->[]->[] |--------------| |------------| | | | | ---> Projector Divides the stream of events into batches, filters events by type and pass those of interest to the mapper
  75. 75. Sequential projector 2/2 ● It’s a good idea to select fixed sizes batches to avoid memory problems when you load your Cursor in memory ● Could be a long-running process selecting events as they arrive in realtime
  76. 76. Event mapper 1/3 Translates event fields to the Read Model domain Takes an event as input, applies a bunch of logic and will return a list of Read Model fields.
  77. 77. Event mapper 2/3 Input event: user-registered Output: $output = [ 'user_id' => 123, // simply copied 'user_name' => 'flash', // simply copied 'email' => 'a-dummy-email@gmail.com', // simply copied 'registered_at' => "2014-21-11T00:00:01Z" // From the data.meta.created_at event field ];
  78. 78. Event mapper 3/3 Input event: user-purchased Output: $output = [ 'user_id' => 123, // simply copied 'email' => 'a-dummy-email@gmail.com', // simply copied 'purchased_at': "2014-21-11T00:10:01Z" // From the data.meta.created_at event field ];
  79. 79. Projection Essentially it is your read model. The data that the business is interested in.
  80. 80. The Projection after event #1 db.users_conversion_rate_projection.findOne() { 'user_id': 123, 'user_name': 'flash', 'email': 'a-dummy-email@gmail.com', 'registered_at': "2014-21-11T00:00:01Z" }
  81. 81. The Projection after event #2 { 'user_id': 123, 'user_name': 'flash', 'email': 'a-dummy-email@gmail.com', 'registered_at': "2014-21-11T00:00:01Z" 'purchased_at': "2014-21-11" // Added this field and rewrote others }
  82. 82. The Projection collection{ 'user_id': 123, 'user_name': 'flash', 'email': 'a-dummy-email@gmail.com', 'registered_at': "2014-21-11", 'purchased_at': "2014-21-11" // Added this field and rewrote others } { 'user_id': 456, 'user_name': 'batman', 'email': 'a-dummy-email@gmail.com', 'registered_at': "2014-21-11", 'purchased_at': "2014-21-11" // Added this field and rewrote others } { 'user_id': 789, 'user_name': 'superman', 'email': 'a-dummy-email@gmail.com', 'registered_at': "2014-21-12", 'purchased_at': "2014-21-12" // Added this field and rewrote others }
  83. 83. The Projection - A few thoughts Note that we didn't copy from events to projection all the available fields. Just relevant ones.
  84. 84. From these two events we could have generated infinite read models such as ● List all purchased products and related amounts for the company buyers ● Map all sales and revenues for our accounting dept ● List transactions for the financial department
  85. 85. One way to write, infinite ways to read!
  86. 86. The aggregation (1) - Total registered users var registered = db.users_conversion_rate_projection.aggregate([ { $match: { "registered_at": { $gte: ISODate("2015-11-21"), $lte: ISODate("2015-11-22") } } }, { $group: { _id: { }, count: { $sum:1 } } } ]);
  87. 87. The aggregation (2) - User with a purchase var purchased = db.users_conversion_rate_projection.aggregate([ { $match: { "registered_at": { $gte: ISODate("2015-11-21"), $lte: ISODate("2015-11-22") }, "purchased_at": { $exists: true } } }, { $group: { _id: { }, count: { $sum:1 } } } ]);
  88. 88. The aggregation (3) - Automate all the things ● You can easily create the aggregation framework statement by composition abstracting the concept of Column. ● This way you can dynamically aggregate your projections on (for example) an API requests. ● If your Projector is a long running process, your projections will be updated to the second and you automagically get realtime data.
  89. 89. Another events usage: Business & Tech Monitoring
  90. 90. Beware of the beast! No Silver Bullet
  91. 91. Events are expensive They require a lot of TIME to be parsed
  92. 92. Events are expensive You will end up with this billion size collection (and counting)
  93. 93. Fixing wrong events is painful
  94. 94. Events are complex
  95. 95. Moving around events is horribly painful
  96. 96. Actually it will make your life incredibly difficult with hidden bugs and leaking documentation. Mongo won’t help you
  97. 97. Improvements ● Upgrade from MongoDB 2.2.x to 3.0.x ● Switch to WiredTiger storage engine to save space
  98. 98. Credits Based on a talk by Jacopo Nardiello ● Slides: http://bit.ly/es-nardiello-2014 ● Video: https://vimeo.com/113370688
  99. 99. Q&A
  100. 100. @SbiellONE — about.bellettini.me Thank you!

×