IAC 2024 - IA Fast Track to Search Focused AI Solutions
OrientDB distributed architecture 1.1
1. rev 1.1
Distributed architecture
with a Multi-Master approach
Available in version 1.0
(planned for December 2011)
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 1 of 41
2. Where is the previous
OrientDB
Master/Slave
architecture?
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 2 of 41
3. www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 3 of 41
4. After first tests we decided to
throw away the old Master-Slave
architecture because it was
against the OrientDB philosophy:
doesn't scale
and
it's hard to configure properly
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 4 of 41
5. So what's next?
We've re-designed the entire distributed
architecture to get it working as
Multi-Master* *http://en.wikipedia.org/wiki/Multi-master_replication
to release in the version 1.0
(december 2011)
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 5 of 41
6. In the Multi-Master architecture
any node can read/write to the database
this scale up horizontly
adding nodes is straightforward
Say wow!
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 6 of 41
7. www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 7 of 41
8. ...but
you have to fight
with
conflicts
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 8 of 41
9. Fortunately we found some
smart ways to resolve conflicts without
falling in a
Blood Bath
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 9 of 41
10. The actors
Only 1 per Leader per cluster, checks other nodes and
Leader Node notify changes to other Peer Nodes. Can be any server
node in the cluster, usually the first to start
Any server node in the cluster. Has a permanent
Peer Node connection to the Leader Node
Clients are connected to Server Nodes no matter if Leader
Client
or Peer
Database Database, where data are stored
Synchronous mode replication. Server node propagates
changes waiting for the response from the remote server,
then sends the ACK to the client
Asynchronous mode replication. Server node propagates
changes and sends the ACK to the client without waiting
for the response from the remote server
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 10 of 41
11. How the cluster
of nodes is
composed
and
managed?
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 11 of 41
12. Cluster auto-discovering
At start up each Server Node sends a IP Multicast message in broadcast to
discover if any Leader Node is available to join the cluster. If available, the
Leader Node will connect to it and it becomes a Peer Node, otherwise it becomes
the Leader node.
Server #1
(Leader) DBDB
DBDB
DBDB
Server #2
(Peer)
DBDB
DBDB
DBDB
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 12 of 41
13. One Leader Multiple Peers
The first node to start is always the Leader but in case of failure can be elected
any other. Leader Node polls all the servers verifying the status and alerts all the
Peer Nodes at every changes in the cluster composition.
Server #1
(Leader) DBDB
DBDB
DBDB
Server #2 Server #3
(Peer) (Peer)
DBDB
DBDB
DBDB DBDB
DBDB
DB
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 13 of 41
14. Asymmetric clustering
Each database can be clustered in multiple server nodes. Databases can be moved
across servers. Replication strategy has per database/server granularity.
This means you could have Server #2 that replicates database B in asynch way
to the Server #3 and database A in synch way to the Server #1.
A
Server #1
(Leader)
C
Server #2 Server #3
(Peer) (Peer)
A B C B
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 14 of 41
15. Distributed configuration
Cluster configuration is broadcasted from the Leader Node to all the Peer Nodes.
Peer Nodes broadcast to all the connected clients.
Everybody knows who has the database
Client #1 Server #1
(Leader) Client #3
Server #2 Server #3
(Peer) (Peer)
Client #2
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 15 of 41
16. Security
To join a cluster the Server Node has to configure the cluster name and password
Broadcast messages are encrypted using the password
Password doesn't cross the network: it's stored in the configuration file
Server #1
(Leader)
Server #2 Join the cluster
(Peer) ONLY
If knows the name
DBDB
DBDB
DBDB and password
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 16 of 41
17. Leader election
Each Peer Node continuously checks the connection with the Leader Node
If lost try to elect itself as a new Leader Node
Split Network resolved using a simple algorithm
Server #1 Server #2
192.168.0.10:2424 192.168.10.27:2424
(Leader) (Leader)
Server #1 takes the
leadership
because has the lower ID
ID = <ip-address>:<port>
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 17 of 41
18. Multiple clusters
Multiple separate clusters can coexist in the same network
Clusters can't see each others. Are separated boxes
What identify a cluster is name + password
Cluster 'A', password 'aaa'
Server #1 Cluster 'B', password 'bbb'
(Leader)
Server #2 Server #1
(Peer)
Server #3 (Leader)
(Peer) Server #2
(Peer)
Server #3
(Peer)
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 18 of 41
19. Fail-over
Clients knows about other nodes, so transparently switch
to good servers. No error is sent to the client app.
Running transactions will be repeated transparently too (v1.2)
Client #1 Client #2 Client #3 Client #4
Server #1 Server #2
DB-1 DB-2
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 19 of 41
20. How the replication works?
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 20 of 41
21. Synchronous Replication
Guarantees two databases are always consistent
More expensive than asynchronous because the First Server
waits for the Second Server's answer before to send back
the ACK to the client. After ACK the Client is secure
the data is placed in multiple nodes at the same time
Server #1 Server #2
DB-1 DB-2
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 21 of 41
22. Synchronous Replication
steps
Client #1
6) Sends back OK to Client #1
1) Update record request
3) Propagates the update
Server #1 Server #2
2) Update record to DB-1 5) Sends back OK to Server #1 4) update record to DB-2
DB-1 DB-2
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 22 of 41
23. Asynchronous Replication
Changes are propagated without waiting for the answer
Two databases could be not consistent in the range of few ms
For this reason it's called “Eventually Consistent”
It's much less expensive than synchronous replication.
Server #1 Server #2
DB-1 DB-2
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 23 of 41
24. Asynchronous Replication
steps
(4a and 4b are executed in parallel)
Client #1
4a) Sends back OK to Client #1
1) Update record request
3) Propagates the update
Server #1 Server #2
2) Update record to DB-1 4b) update record to DB-2
DB-1 DB-2
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 24 of 41
25. Error Management
During replication the Second Server could get an error due to a
conflict (the record was modified in the same moment from another client)
or a I/O problem. In this case the error is logged to disk to being fixed later.
Client #1
4) Sends back OK to Client #1
1) Update record request
3) Propagates the update
Server #1 Server #2
2) Update record to DB-1 6) log the error 5) update record to DB-2
DB-1 Synch Log DB-2
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 25 of 41
26. Conflict Management
During replication conflicts could happen if two clients are
updating the same record at the same time
The conflicts resolution strategy can be plugged by providing
implementations of the OConflictResolver interface
Server #2
Conflict Strategy DB-2
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 26 of 41
27. Conflict Management
Default strategy
Default implementation Server #2
merges the records:
in case same fields are
changed the oldest
Default DB-2
document wins and the
Conflict Strategy
newest is written into the
Synch Log
Synch Log
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 27 of 41
28. Manual control of conflicts
like SVN/GIT tools
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 28 of 41
29. Display the diff of 2 databases
> compare database db1 db2
Copy a record across databases
> copy record #10:20@db1 to #10:20@db2
Copy entire cluster across databases
> copy cluster city@db1 to city@db2
Merges two records across databases
> merge records #10:20@db1 #10:20@db2
to #10:20@db1
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 29 of 41
30. How nodes are re-aligned
once up again after a fail,
shutdown or network problem?
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 30 of 41
31. During replication all operations
are logged using
unique op-id with the format <node>#<serial>
Client
Update a record
Server #1 Server #2
Op-id: 192.168.0.10:2424#123232 Op-id: 192.168.0.10:2424#123232
Operation Log DB-1 DB-2 Operation Log
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 31 of 41
32. On restart the node asks to the Leader
which are the servers to synchronize
op-ids are used to know the operation missed
Server #1 Server #2
Op-id: 192.168.1.11:2424#9569 Op-id: 192.168.0.10:2424#123232
Operation Log DB-1 DB-2 Operation Log
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 32 of 41
33. To be
consistent
or not be,
that is
the question
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 33 of 41
34. Always consistent
use it as a Master-Slave
Read only, consistent. Leave it as
Read/Write. All replica. Since it's always aligned it's
changes on this server the best candidate as new master if
avoiding conflicts Server #1 is unavailable
Client Server #1 Server #2
Master Synch Slave
Client read + write read only
Perfect for Analysis,
One-way only
Business Intelligence
and Reports
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 34 of 41
35. Read-only scaling
using many asynchronous replicas
Read/Write. All
changes on this server
avoiding conflicts
Server #2
Synch Slave
Client Server #1 read only
Master
Client read + write Server #N
Server #3
Asynch Slave#3
Server
Asynch Slave#3
Server
read only
Asynch Slave
read only
Asynch Slave
Read only, eventually read only
read only
consistent. Replication
cost close to zero
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 35 of 41
36. Read/Write scaling
Multi master + handling conflicts
Client Server #1
Master
Client read + write
Server #2 Client
Master
read + write Client
Client Server #3
Master
Client read + write
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 36 of 41
37. Read/Write scaling + sharding
Multi master, no conflict! :-)
Server USA
Client Master customers_usa
Writes on read + write
customers_usa
Writes on
customers_china
Server CHI
Client Master customers_china
read + write
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 37 of 41
38. Multi-Master + Sharding
=
big scale in high-availability and no conflicts
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 38 of 41
39. www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 39 of 41
40. NuvolaBase.com (beta)
The first
Graph Database
on the Cloud
always available
few seconds to setup it
use it from Web & Mobile
apps
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 40 of 41
41. Luca Garulli
Author of OrientDB and
Roma <Meta> Framework
Open Source projects,
Member of JSR#12 (jdo 1.0)
and JSR#243 (jdo 2.0)
www.twitter.com/lgarulli
@London, UK CEO at Nuvola Base Ltd
and
@Rome, Italy
www.orientechnologies.com Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 41 of 41