SlideShare una empresa de Scribd logo
1 de 7
Descargar para leer sin conexión
TalonStore: A Multi Master replication Strategy
Vishant Bhole
Illinois Institute of Technology
Chicago, US
vbhole@hawk.iit.edu
Saptarshi Chatterjee
Illinois Institute of Technology
Chicago, US
schatterjee@hawk.iit.edu
Abstract –
Data Replication is the process of storing data in
more than one site or node.It is useful in improving the
availability of data. The result is a distributed database
in which users can access data relevant to their tasks
without interfering with the work of others.Data
Replication is generally performed to provide a
consistent copy of data across all the database nodes.
Traditionally it’s done by copying data from one
database server to another, so that all the servers can
have the same data. Our implementation, proposes a
completely different approach. Instead of copying data
from one node to another, in our design , ​master replicas
do not directly communicate ​between each other and
work virtually independently for write queries. For read
queries, an independent process consults all the replicas
to constitute a quorum​. ​and returns the result if 
majority of the machines in the system response with the
same result .
Keywords: ​Multi Master replication, Distributed
Systems,Big Data, kafka, Dynamo, RDBMS, Nosql.
I​. INTRODUCTION
Synchronous Multi-master replication, ensures that we
can write to any node and be sure that the write will be
consistent for all nodes in the cluster at any given point of
time. With multi-master replication any write is either
committed on all nodes or not committed at all.
Multi-master replication is also known as advanced
replication or symmetric replication and allows us to
maintain multiple sets of identical data at various sites. In a
multi master setup redirecting read queries towards any of
the master should yield the same output. Other benefits of
Multimaster replication are -
● Load balancing - It refers to effectively handling
incoming network traffic across a group of servers.
● Fault Tolerance - It enables a system to continue
operating properly in case of failure of some server.
● Increase data locality and availability - Ensures
that when any client request the data it gets back to
them from nearest server
Multi-master replication varies in a way from
master-slave replication, in which a single member of the
group is elected as the "master" for a portion of data and is
the only node allowed to modify that portion of data. Other
members willing to modify the data must contact the master
node. Allowing only a single master makes it easier to
achieve consistency among the members of the group, but is
less flexible than multi-master replication.
Synchronous Multi-master replication can likewise be
stood out from asynchronous replication where detached
slave servers are duplicating the master information with the
end goal to get ready for takeover if the master quits
working. Asynchronous replication supports eventual
consistency whereas we can implement a strictly consistent
system and distributed transaction using Synchronous
Multi-master replication.
In this article we researched related work in this field,
how existing systems implement Multi Master replication ,
their benefits and limitations. Based on our research we
propose a new approach, where master replicas doesn’t
acquire lock on other replicas, and works parallely for write
queries. And for For read queries, a separate process consults
all the replicas to constitute a quorum.
​Fig. 1 Multi-master replication system [8]
II​. RELATED​ Work
We researched on many of the similar architecture from
the existing technologies such as RDBMS implementation by
Oracle, Nosql database like MongoDB, HDFS, Kafka
replication strategies, Cassandra, Amazon’s Dynamo. Our
findings are described in the next section.
A. Oracle’s implementation of Multi master Replication
Oracle supports two types of multimaster replication:
Asynchronous replication​: captures any local changes,
stores them in a queue, and, at regular intervals, propagates
and applies these changes at remote sites. With this form of
replication, there is a period of time before all sites achieve
data convergence.
Synchronous replication​: Applies any changes to all sites
participating in the replication environment as part of a
single transaction. If the propagation fails at any of the
master sites, then the entire transaction, including the initial
change at the local master site, rolls back. Ensures data
consistency across the replication environment. There is
never a period of time when the data at any of the master
sites does not match. Hence strict consistency is enforced.
Oracle first locks the local row and then uses an
AFTER ROW trigger to lock the corresponding remote row.
Oracle releases the locks when the transaction commits at
each site.
Supports distributed Transaction.
Fig 2. Oracle’s implementation of Synchronous replication
There are a few limitations of this approach
● Distributed transactions are complex. Local
changes need to roll back if any participating
system fails.
● If one of the participating node goes down or
replies slow, then entire system can’t accept write
queries. Hence such a system is very fragile.
● Can’t handle byzantine faults, as nodes answer to
read queries directly without constituting a quorum
B. MongoDB data replication
We failed find any strong proof that MongoDB
supports synchronous Multi master replication .Distributed
MongoDB cluster consists of a group of ​mongod instances
(replica set ) that maintain the same data set. A replica set
optionally contains one arbiter node and consists of several
data nodes. Amongst all the data bearing nodes, one and
only one member is deemed as the primary node, while the
other nodes are deemed secondary nodes.
Empirically The primary node receives all write
operations[11]. The ​secondaries replicate the primary’s
oplog and apply the operations to their data sets. If the
primary is unavailable, an eligible secondary will hold an
election to elect itself the new primary.A rollback reverts
write operations on a former ​primary when the member
rejoins its ​replica set after a ​failover​, if the primary had
accepted write operations that the ​secondaries had not
successfully replicated before the primary stepped down.
Fig. 3 MongoDB Replication
C. Kafka Replication
Every topic partition in Kafka is replicated n
(configurable replication factor) times. This allows Kafka to
automatically failover to these replicas when a server in the
cluster fails so that messages remain available in the presence
of failures. Replication in Kafka happens at the partition
granularity where the partition’s write-ahead log is replicated
in order to n servers. Out of the n replicas, one replica is
designated as the leader while others are followers. Leader
takes the writes from the producer and the followers merely
copies the leader’s log in order.
The leader for every partition tracks this in-sync replica
(aka ISR) list by computing the lag of every replica from
itself. When a producer sends a message to the broker, it is
written by the leader and replicated to all the partition’s
replicas. A message is committed only after it has been
successfully copied to all the in-sync replicas​.
 
Fig. 4 Kafka replication [7]
D. HDFS Replication
HDFS replication enables replication of HDFS data from
one HDFS service to another, synchronizing the data set on
the destination service with the data set on the source
service. While performing a replication we need to ensure
that the source directory is not modified. A file added during
replication does not get replicated. If we delete a file during
replication, the replication fails.
HDFS is not optimized for incremental write /
append . It’s suitable for Write Once and read many times
model. It stores each file as a sequence of blocks . Blocks
are placed on different Data Nodes[12]. NameNode keeps
track of the Blocks in the data Node.Default Replication
factor is 3.
Fig. 5 Block Replication[12]
Each HDFS block is constructed through a write pipeline.
Bytes are pushed to the pipeline packet by packet. There are
effectively 3 stages in a HDFS replication -
Stage 1 ​- In this stage a pipeline is been set up. A Write Block
request is sent by a client downstream along the pipeline.
After the last Data Node receives the request, an ack is sent by
the Data Node upstream along the pipeline back to
the client.
Stage 2 - In this stage user data first buffer at the client side.
After a packet is filled up, the data then get pushed to the
pipeline. We can call this Data streaming stage.
Stage 3 - In this stage the client sends a close request only
after all packets have been acknowledged at the client side.
When a block replication is finalized it shuts down block the
pipeline
Fig. 6 Block Construction Pipeline
E. Amazon’s Dynamo
Amazon’s ​Dynamo is a highly available key value
storage system. It support primary key access to the data,
which can be useful for services like session management.
Dynamo’s use cases for these types of services will provide a
highly available system that always accept write queries.
These requirements force the complexity of conflict
resolution to data readers. Writes are never rejected.
Dynamo combines many core distributed system techniques
to solve problem at Amazon scale. And focuses on solving
the problem in data versioning, partitioning and replication.
Data Versioning
When writes are initiate to replicas ​asynchronously​,
Dynamo shows eventual consistency. Non-updated nodes
returns any object which is not updated with latest version at
the time when we fire get operation. The result at each
modification are treated as new and immutable version of
data which is assigned by a vector clock which increments at
every version. When we want to update the object, it
specifies the version of object which is updating and when
any client reads the object. Client is responsible for the merge
these divergent versions according to our need.It also
provides the background process which automatically merges
versions of data without any conflicts.
​Partitioning
Dynamo allows the system to scale incrementally by
adding more number of nodes.Which requires system to
dynamically partition data over the set of nodes. For
achieving this, Dynamo introduce consistent hashing for
assigning each data item to a node. These nodes are arranged
in a ring where the largest hash value wraps around to the
smallest hash value.
The arrival and departure of nodes from the cluster only
affects that nodes immediate neighbours, we can observe this
when we arrange nodes in a ring
Fig. 7 Dynamo Ring Arrangement[3]
Replication
Data is replicated on many hosts for providing the
durability and high data availability in Dynamo. Every data is
replicated at n hosts. Every data key is assigned a coordinator
node which is in charge with replicating data at n-1 neighbor
hosts in the ring.
​F. Data replication in Cassandra
Cassandra design is influenced by Amazon’s Dynamo
paper published in 2007. It divides a hash ring into a several
chunks, and keeps N replicas of each chunk on different
nodes.Developers can tune quorums, and active anti-entropy
to keep replicas up to date.[4]
Cassandra uses replication to achieve high availability
and durability. Each data item is replicated at N
(configurable replication factor) hosts . Each key, k, is
assigned to a coordinator node. The coordinator takes care
of the replication.Coordinator stores the data locally and
also replicates the data at the N-1 nodes. Cassandra provides
configurable replication policies like “Rack Unaware”,
“Rack Aware” (within a datacenter) and “Datacenter
Aware”.Cassandra system elects a leader amongst its nodes
using a system called Zookeeper. A participating node in the
cluster contacts the leader, who intern tells them for what
data ranges they are replicas for. The metadata about the
ranges a node is responsible is stored locally at each node
and inside Zookeeper. So when a node crashes and comes
back up knows what ranges it was responsible for. All nodes
aware of every other node in the system and hence the range
they are responsible for.
Fig. 8 Cassandra Data Flow
There are three types of read requests that a coordinator
sends to replicas.
a) Direct request
b) Digest request
c) Read repair request
● The coordinator sends direct request to one of the
replicas.
● After that, the coordinator sends the digest request
to the number of replicas specified by the
consistency level and checks whether the returned
data is an updated data.
● After that, the coordinator sends digest request to
all the remaining replicas. If any node gives out of
date value, a background read repair request will
update that data. This process is called read repair
mechanism.
V​. OUR PROPOSED SYSTEM DESIGN
Our implementation, inspired by Dynamo system[3],
proposes a slightly different approach. We have
disintegrated the data nodes from the event bus and
coordinator system.
We researched several Scalable System Design Patterns​[2]
and found out " event-based architecture" model is most
suitable to address this problem .Event-based architecture
supports several communication styles:
● Publish-subscribe
● Broadcast
● Point-to-point
Publish-subscribe communication style decouples
sender & receiver and facilitates asynchronous
communication. Event-driven architecture (EDA) promotes
the production, detection, consumption of, and reaction to
events[10].
The main advantage of this architecture is that they are
loosely coupled.
Fig. 9 Event Driven Architectural Model
A​. Client sends the write query to a write coordinator system
. Co- ordinator system pushes data into a queue against a
particular topic.
@RequestMapping(method = RequestMethod.POST, 
consumes = {"application/x-www-form-urlencoded"}, 
value = "/api/savedata") 
public String saveData(@RequestParam Map<String, 
String> savequery){ 
String key = savequery.get("key"); 
String value = savequery.get("value"); 
kafkaTemplate.send("savedata", 
dataFormattor(key,value)); 
return "Data successfully Saved"; 
}
B. All the participating systems subscribes to that topic and
listens to the changes. Each subscriber belongs to a different
group , and they processes write queries parallely and
independent of each other. Saves the changes to local file
system. At this point we can send write acknowledgement
back to client if data is written to at least one node or wait
for ack from all the nodes , thus enforcing strong
consistency.
@KafkaListener(topics = "savedata", groupId =         
"${diskpath.property}") 
public void saveDataToDisk(String message) throws 
IOException{ 
String data[] = dataDeserializer(message); 
File file = new File(diskpath+"/"+data[0]); 
file.getParentFile().mkdirs(); 
FileWriter writer = new FileWriter(file); 
try { 
writer.write(data[1]); 
} catch(Exception e){ 
logger.info(e.getMessage()); 
}finally{ 
writer.close(); 
} 
} 
C. When a read query comes to a participating node ,
instead of directly replying to the query , it constitutes a
quorum , patiently waits for all the participating node to
catch up and then returns the result back to client if all the
Node agrees to same data . Thus enforcing strong
consistency . Here we can send back a reply even if majority
of the nodes agrees to the result , resulting in successful
elimination of byzantine faults.
@KafkaListener(topics = "retrievedata", groupId = 
"coordinator") 
public void retrieveValue(String message) throws 
FileNotFoundException, IOException{ 
String data[] = dataDeserializer(message); 
//Read data from all the participating nodes 
BufferedInputStream reader1 = new 
BufferedInputStream(new 
FileInputStream(diskpath+"/"+data[1]) ); 
BufferedInputStream reader2 = new 
BufferedInputStream(new 
FileInputStream(machine1+"/"+data[1]) ); 
BufferedInputStream reader3 = new 
BufferedInputStream(new 
FileInputStream(machine2+"/"+data[1]) ); 
boolean running = true; 
while( running ) { 
// Wait for all the nodes to catchup . 
if( reader1.available() > 0 && reader2.available() > 0 && 
reader3.available() > 0) { 
String val1 = IOUtils.toString(reader1, "UTF-8"); 
String val2 = IOUtils.toString(reader2, "UTF-8"); 
String val3 = IOUtils.toString(reader3, "UTF-8"); 
//Constitute a quorum . All 3 nodes should match. 
Enforce Strong consistency 
if(val1.equals(val2) && val2.equals(val3)){ 
webSocket.convertAndSend("/topic/backToClient/"+data[0], 
val1); 
running = false; 
} else { 
webSocket.convertAndSend("/topic/backToClient/"+data[0], 
"Nodes give different Data"); 
running = false; 
} 
}else { 
try { 
Thread.sleep(150); 
}catch( InterruptedException ex ) { 
running = false; 
} 
} 
} 
} 
 
 
In essence this proposed architecture looks as follows.
Fig. 10 Talon Store Architecture.
VI. RESULTS AND DISCUSSION
We tested our system with multiple parallel client
requests and a dual partition queue topic with 2 brokers .
And received following metrics in a macOs Mojave
(MacBook Pro 2017 , 2.3 GHz Intel Core i5 , 8 GB 2133
MHz LPDDR3 )
Our results shows this architecture performs marginally
faster compared to cassandra under similar load , but much
of this data can be influenced by the fact that we tested on a
single standalone system instead of a actual distributed
network of the system.
However proposed design is much more decoupled
where we can tweak and configure each blocks of the
system separately.
Write query performance
Read query performance
start.time, end.time indicates experiment start and end time.
fetch.size - Amount of data to fetch in a single request.
data.consumed.in.MB - Size of all messages consumed.
***MB.sec* - Data transferred in MB per sec(Throughput
on size).
data.consumed.in.nMsg - Count of the total message which
was consumed during this test.
nMsg.sec - How many messages consumed in a
sec(Throughput on the count of messages).
VII​. FUTURE​ Work
In this literature we only discussed about how we
can detect byzantine faults and didn’t actually correct them.
How ever this can be easily addressed the way Cassandra
solves this. Once the read coordinator detects the faulty
node, it can send a data repair request to faulty node and that
node in turn fix the corrupt data by using GOSSIP protocol
with other participating node.
Also for this research we implemented web based client
and web based client can not have a socket connection with
any server running on different domain due to CORS issues,
so we had to serve read queries from from the Rest
Controller which receives the query. We can easily bypass
this limitation in a non web based client .
VIII. APPENDIX
Project Final Demo Link
https://www.youtube.com/watch?v=0jBl7rOrQiU
Source code - ​https://github.com/sap9433/TalonSystems
​IX​. ​REFERENCES
[1] Shvachko, K., Kuang, H., Radia, S. and Chansler, R.
(2018). ​The Hadoop Distributed File System​. [online]
Storageconference.us. Available at: http://storageconference
.us/2010/Papers/MSST/Shvachko.pdf [Accessed 28 Nov.
2018].
[2] Kreps, J., Narkhede, N. and Rao, J. (2018). ​Kafka: a
Distributed Messaging System for Log Processing​. [online]
Notes.stephenholiday.com. Available at: http://notes.stephen
holiday.com/Kafka.pdf [Accessed 28 Nov. 2018].
[3] DeCandia, G., Sivasubramanian, S., Lakshman, A. and
Hastorun, D. (2018). ​Dynamo: Amazon’s Highly Available
Key-value Store​. [online] Courses.cse.tamu.edu. Available
at: http://courses.cse.tamu.edu/caverlee/csce438/readings/dy
namo-paper.pdf [Accessed 28 Nov. 2018].
[4] Lakshman, A. and Malik, P. (2018). Cassandra - A
Decentralized Structured Storage System. ​cs.cornell.edu​.
[online] Available at: https://www.cs.cornell.edu/projects/
ladis2009/papers/lakshman-ladis2009.pdf [Accessed 28
Nov. 2018].
[5] "Multi-Master Replication." MySQL at Twitter: No
More Forkin' - Migrating to MySQL Community Version |
Percona Live - Open Source Database Conference 2018.
Accessed November 26, 2018. https://www.percona.com
/doc/percona-xtradb-cluster/LATEST/features/multimaster-r
eplication.html.
[6] "Database Advanced Replication." Master Replication
Concepts and Architecture. August 01, 2008. Accessed
November 26, 2018. https://docs.oracle.com/cd/B28359
_01/server.111/b28326/repmaster.htm#sthref144.
[7] Narkhede, N. (2018). Hands-free Kafka Replication: A
lesson in operational simplicity. ​confluent​. [online]
Available at: https://www.confluent.io/blog/hands-free-
kafka-replication-a-lesson-in-operational-simplicity/[Access
ed 28 Nov. 2018].
[8] Fabio Erculiani. "Google/mysql-tools." GitHub.
Accessed December 06, 2018. https://github.com/
google/mysql-tools/wiki/Semi-Sync-Replication-Design.
[9]​"Database Advanced Replication." Master Replication
Concepts and Architecture. August 01, 2008. Accessed
November 26, 2018. https://docs.oracle.com/
cd/B28359_01/server.111/b28326/repmaster.htm#sthref144
[10] ​Dr. Tong Lai Yu. "Distributed Systems Architecture."
Distributed Systems Architecture. Accessed December 06,
2018.
http://cse.csusb.edu/tongyu/courses/cs660/notes/distarch.php
[11] "Replication." In-Memory Storage Engine - MongoDB
Manual. Accessed December 06, 2018. https://docs.mongo
db.com/manual/replication/.
[12]Bakshi, Ashish. "Hadoop Distributed File System |
Apache Hadoop HDFS Architecture | Edureka." Edureka
Blog. December 05, 2018. Accessed December 06, 2018.
https://www.edureka.co/blog/apache-hadoop-hdfs-architectu
re/.

Más contenido relacionado

La actualidad más candente

Advanced databases ben stopford
Advanced databases   ben stopfordAdvanced databases   ben stopford
Advanced databases ben stopford
Ben Stopford
 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File System
tutchiio
 
Database , 13 Replication
Database , 13 ReplicationDatabase , 13 Replication
Database , 13 Replication
Ali Usman
 
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...
DataWorks Summit
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
Ashish Kumar
 

La actualidad más candente (20)

Client Centric Consistency Model
Client Centric Consistency ModelClient Centric Consistency Model
Client Centric Consistency Model
 
Advanced databases ben stopford
Advanced databases   ben stopfordAdvanced databases   ben stopford
Advanced databases ben stopford
 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File System
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)
 
Kafka Technical Overview
Kafka Technical OverviewKafka Technical Overview
Kafka Technical Overview
 
Google File System
Google File SystemGoogle File System
Google File System
 
Database , 13 Replication
Database , 13 ReplicationDatabase , 13 Replication
Database , 13 Replication
 
GOOGLE FILE SYSTEM
GOOGLE FILE SYSTEMGOOGLE FILE SYSTEM
GOOGLE FILE SYSTEM
 
Seminar Report on Google File System
Seminar Report on Google File SystemSeminar Report on Google File System
Seminar Report on Google File System
 
Chap 4
Chap 4Chap 4
Chap 4
 
DIY: A distributed database cluster, or: MySQL Cluster
DIY: A distributed database cluster, or: MySQL ClusterDIY: A distributed database cluster, or: MySQL Cluster
DIY: A distributed database cluster, or: MySQL Cluster
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
 
Vote NO for MySQL
Vote NO for MySQLVote NO for MySQL
Vote NO for MySQL
 
Gfs google-file-system-13331
Gfs google-file-system-13331Gfs google-file-system-13331
Gfs google-file-system-13331
 
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...
 
Load Balancing In Distributed Computing
Load Balancing In Distributed ComputingLoad Balancing In Distributed Computing
Load Balancing In Distributed Computing
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
Buffer management
Buffer managementBuffer management
Buffer management
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
 
Process coordination
Process coordinationProcess coordination
Process coordination
 

Similar a Talon systems - Distributed multi master replication strategy

MongoDB Replication and Sharding
MongoDB Replication and ShardingMongoDB Replication and Sharding
MongoDB Replication and Sharding
Tharun Srinivasa
 
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DEMC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
Aravind NC
 
Postgresql_Replication.pptx
Postgresql_Replication.pptxPostgresql_Replication.pptx
Postgresql_Replication.pptx
StephenEfange3
 
Lecture-04-Principles of data management.pdf
Lecture-04-Principles of data management.pdfLecture-04-Principles of data management.pdf
Lecture-04-Principles of data management.pdf
manimozhi98
 
Altoros using no sql databases for interactive_applications
Altoros using no sql databases for interactive_applicationsAltoros using no sql databases for interactive_applications
Altoros using no sql databases for interactive_applications
Jeff Harris
 
Distributed Shared Memory Systems
Distributed Shared Memory SystemsDistributed Shared Memory Systems
Distributed Shared Memory Systems
Ankit Gupta
 

Similar a Talon systems - Distributed multi master replication strategy (20)

MongoDB Replication and Sharding
MongoDB Replication and ShardingMongoDB Replication and Sharding
MongoDB Replication and Sharding
 
MySQL Group Replication
MySQL Group ReplicationMySQL Group Replication
MySQL Group Replication
 
Chapter 6-Consistency and Replication.ppt
Chapter 6-Consistency and Replication.pptChapter 6-Consistency and Replication.ppt
Chapter 6-Consistency and Replication.ppt
 
Google file system
Google file systemGoogle file system
Google file system
 
Os9
Os9Os9
Os9
 
Distributed database
Distributed databaseDistributed database
Distributed database
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
 
MySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspectiveMySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspective
 
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DEMC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
 
Postgresql_Replication.pptx
Postgresql_Replication.pptxPostgresql_Replication.pptx
Postgresql_Replication.pptx
 
Lecture-04-Principles of data management.pdf
Lecture-04-Principles of data management.pdfLecture-04-Principles of data management.pdf
Lecture-04-Principles of data management.pdf
 
Distributed database
Distributed databaseDistributed database
Distributed database
 
No sql (not only sql)
No sql                 (not only sql)No sql                 (not only sql)
No sql (not only sql)
 
Altoros using no sql databases for interactive_applications
Altoros using no sql databases for interactive_applicationsAltoros using no sql databases for interactive_applications
Altoros using no sql databases for interactive_applications
 
Distributed D B
Distributed  D BDistributed  D B
Distributed D B
 
Comparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsbComparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsb
 
MongoDB
MongoDBMongoDB
MongoDB
 
Distributed Shared Memory Systems
Distributed Shared Memory SystemsDistributed Shared Memory Systems
Distributed Shared Memory Systems
 
Oracle Coherence
Oracle CoherenceOracle Coherence
Oracle Coherence
 
Distributed database
Distributed databaseDistributed database
Distributed database
 

Más de Saptarshi Chatterjee

Más de Saptarshi Chatterjee (18)

System to generate speech to text in real time
System to generate speech to text in real timeSystem to generate speech to text in real time
System to generate speech to text in real time
 
ReactJs Optimizations , Making server side react faster
ReactJs Optimizations , Making server side react faster ReactJs Optimizations , Making server side react faster
ReactJs Optimizations , Making server side react faster
 
Code splitting with server side react
Code splitting with server side reactCode splitting with server side react
Code splitting with server side react
 
Auto generate customized test suit for your AngularJs
Auto generate customized test suit for your AngularJsAuto generate customized test suit for your AngularJs
Auto generate customized test suit for your AngularJs
 
Doc003
Doc003Doc003
Doc003
 
Input02
Input02Input02
Input02
 
Puja 2013
Puja 2013Puja 2013
Puja 2013
 
title
titletitle
title
 
Saptarshi chatterjee-resume
Saptarshi chatterjee-resumeSaptarshi chatterjee-resume
Saptarshi chatterjee-resume
 
SlideShare team outing
SlideShare team outingSlideShare team outing
SlideShare team outing
 
Slide share potluck
Slide share potluckSlide share potluck
Slide share potluck
 
Placement report-2012
Placement report-2012Placement report-2012
Placement report-2012
 
Getting started
Getting startedGetting started
Getting started
 
Trends in online communities
Trends in online communitiesTrends in online communities
Trends in online communities
 
5 x5
5 x55 x5
5 x5
 
Master
MasterMaster
Master
 
Java Teacher in delhi
Java Teacher in delhiJava Teacher in delhi
Java Teacher in delhi
 
Slideshare is now linkedin
Slideshare is now linkedinSlideshare is now linkedin
Slideshare is now linkedin
 

Último

Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
JOHNBEBONYAP1
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
ayvbos
 
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
ydyuyu
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
ayvbos
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
pxcywzqs
 
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu DhabiAbu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Monica Sydney
 

Último (20)

20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
Call girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girlsCall girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girls
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
 
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac RoomVip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
 
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
 
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime BalliaBallia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
 
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call GirlsMira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
 
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu DhabiAbu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 

Talon systems - Distributed multi master replication strategy

  • 1. TalonStore: A Multi Master replication Strategy Vishant Bhole Illinois Institute of Technology Chicago, US vbhole@hawk.iit.edu Saptarshi Chatterjee Illinois Institute of Technology Chicago, US schatterjee@hawk.iit.edu Abstract – Data Replication is the process of storing data in more than one site or node.It is useful in improving the availability of data. The result is a distributed database in which users can access data relevant to their tasks without interfering with the work of others.Data Replication is generally performed to provide a consistent copy of data across all the database nodes. Traditionally it’s done by copying data from one database server to another, so that all the servers can have the same data. Our implementation, proposes a completely different approach. Instead of copying data from one node to another, in our design , ​master replicas do not directly communicate ​between each other and work virtually independently for write queries. For read queries, an independent process consults all the replicas to constitute a quorum​. ​and returns the result if  majority of the machines in the system response with the same result . Keywords: ​Multi Master replication, Distributed Systems,Big Data, kafka, Dynamo, RDBMS, Nosql. I​. INTRODUCTION Synchronous Multi-master replication, ensures that we can write to any node and be sure that the write will be consistent for all nodes in the cluster at any given point of time. With multi-master replication any write is either committed on all nodes or not committed at all. Multi-master replication is also known as advanced replication or symmetric replication and allows us to maintain multiple sets of identical data at various sites. In a multi master setup redirecting read queries towards any of the master should yield the same output. Other benefits of Multimaster replication are - ● Load balancing - It refers to effectively handling incoming network traffic across a group of servers. ● Fault Tolerance - It enables a system to continue operating properly in case of failure of some server. ● Increase data locality and availability - Ensures that when any client request the data it gets back to them from nearest server Multi-master replication varies in a way from master-slave replication, in which a single member of the group is elected as the "master" for a portion of data and is the only node allowed to modify that portion of data. Other members willing to modify the data must contact the master node. Allowing only a single master makes it easier to achieve consistency among the members of the group, but is less flexible than multi-master replication. Synchronous Multi-master replication can likewise be stood out from asynchronous replication where detached slave servers are duplicating the master information with the end goal to get ready for takeover if the master quits working. Asynchronous replication supports eventual consistency whereas we can implement a strictly consistent system and distributed transaction using Synchronous Multi-master replication. In this article we researched related work in this field, how existing systems implement Multi Master replication , their benefits and limitations. Based on our research we propose a new approach, where master replicas doesn’t acquire lock on other replicas, and works parallely for write queries. And for For read queries, a separate process consults all the replicas to constitute a quorum. ​Fig. 1 Multi-master replication system [8] II​. RELATED​ Work We researched on many of the similar architecture from the existing technologies such as RDBMS implementation by Oracle, Nosql database like MongoDB, HDFS, Kafka replication strategies, Cassandra, Amazon’s Dynamo. Our findings are described in the next section. A. Oracle’s implementation of Multi master Replication Oracle supports two types of multimaster replication: Asynchronous replication​: captures any local changes, stores them in a queue, and, at regular intervals, propagates and applies these changes at remote sites. With this form of replication, there is a period of time before all sites achieve data convergence.
  • 2. Synchronous replication​: Applies any changes to all sites participating in the replication environment as part of a single transaction. If the propagation fails at any of the master sites, then the entire transaction, including the initial change at the local master site, rolls back. Ensures data consistency across the replication environment. There is never a period of time when the data at any of the master sites does not match. Hence strict consistency is enforced. Oracle first locks the local row and then uses an AFTER ROW trigger to lock the corresponding remote row. Oracle releases the locks when the transaction commits at each site. Supports distributed Transaction. Fig 2. Oracle’s implementation of Synchronous replication There are a few limitations of this approach ● Distributed transactions are complex. Local changes need to roll back if any participating system fails. ● If one of the participating node goes down or replies slow, then entire system can’t accept write queries. Hence such a system is very fragile. ● Can’t handle byzantine faults, as nodes answer to read queries directly without constituting a quorum B. MongoDB data replication We failed find any strong proof that MongoDB supports synchronous Multi master replication .Distributed MongoDB cluster consists of a group of ​mongod instances (replica set ) that maintain the same data set. A replica set optionally contains one arbiter node and consists of several data nodes. Amongst all the data bearing nodes, one and only one member is deemed as the primary node, while the other nodes are deemed secondary nodes. Empirically The primary node receives all write operations[11]. The ​secondaries replicate the primary’s oplog and apply the operations to their data sets. If the primary is unavailable, an eligible secondary will hold an election to elect itself the new primary.A rollback reverts write operations on a former ​primary when the member rejoins its ​replica set after a ​failover​, if the primary had accepted write operations that the ​secondaries had not successfully replicated before the primary stepped down. Fig. 3 MongoDB Replication C. Kafka Replication Every topic partition in Kafka is replicated n (configurable replication factor) times. This allows Kafka to automatically failover to these replicas when a server in the cluster fails so that messages remain available in the presence of failures. Replication in Kafka happens at the partition granularity where the partition’s write-ahead log is replicated in order to n servers. Out of the n replicas, one replica is designated as the leader while others are followers. Leader takes the writes from the producer and the followers merely copies the leader’s log in order. The leader for every partition tracks this in-sync replica (aka ISR) list by computing the lag of every replica from itself. When a producer sends a message to the broker, it is written by the leader and replicated to all the partition’s replicas. A message is committed only after it has been successfully copied to all the in-sync replicas​.
  • 3.   Fig. 4 Kafka replication [7] D. HDFS Replication HDFS replication enables replication of HDFS data from one HDFS service to another, synchronizing the data set on the destination service with the data set on the source service. While performing a replication we need to ensure that the source directory is not modified. A file added during replication does not get replicated. If we delete a file during replication, the replication fails. HDFS is not optimized for incremental write / append . It’s suitable for Write Once and read many times model. It stores each file as a sequence of blocks . Blocks are placed on different Data Nodes[12]. NameNode keeps track of the Blocks in the data Node.Default Replication factor is 3. Fig. 5 Block Replication[12] Each HDFS block is constructed through a write pipeline. Bytes are pushed to the pipeline packet by packet. There are effectively 3 stages in a HDFS replication - Stage 1 ​- In this stage a pipeline is been set up. A Write Block request is sent by a client downstream along the pipeline. After the last Data Node receives the request, an ack is sent by the Data Node upstream along the pipeline back to the client. Stage 2 - In this stage user data first buffer at the client side. After a packet is filled up, the data then get pushed to the pipeline. We can call this Data streaming stage. Stage 3 - In this stage the client sends a close request only after all packets have been acknowledged at the client side. When a block replication is finalized it shuts down block the pipeline Fig. 6 Block Construction Pipeline E. Amazon’s Dynamo Amazon’s ​Dynamo is a highly available key value storage system. It support primary key access to the data, which can be useful for services like session management. Dynamo’s use cases for these types of services will provide a highly available system that always accept write queries. These requirements force the complexity of conflict resolution to data readers. Writes are never rejected. Dynamo combines many core distributed system techniques to solve problem at Amazon scale. And focuses on solving the problem in data versioning, partitioning and replication. Data Versioning When writes are initiate to replicas ​asynchronously​, Dynamo shows eventual consistency. Non-updated nodes returns any object which is not updated with latest version at the time when we fire get operation. The result at each modification are treated as new and immutable version of data which is assigned by a vector clock which increments at every version. When we want to update the object, it specifies the version of object which is updating and when any client reads the object. Client is responsible for the merge these divergent versions according to our need.It also provides the background process which automatically merges versions of data without any conflicts. ​Partitioning Dynamo allows the system to scale incrementally by adding more number of nodes.Which requires system to
  • 4. dynamically partition data over the set of nodes. For achieving this, Dynamo introduce consistent hashing for assigning each data item to a node. These nodes are arranged in a ring where the largest hash value wraps around to the smallest hash value. The arrival and departure of nodes from the cluster only affects that nodes immediate neighbours, we can observe this when we arrange nodes in a ring Fig. 7 Dynamo Ring Arrangement[3] Replication Data is replicated on many hosts for providing the durability and high data availability in Dynamo. Every data is replicated at n hosts. Every data key is assigned a coordinator node which is in charge with replicating data at n-1 neighbor hosts in the ring. ​F. Data replication in Cassandra Cassandra design is influenced by Amazon’s Dynamo paper published in 2007. It divides a hash ring into a several chunks, and keeps N replicas of each chunk on different nodes.Developers can tune quorums, and active anti-entropy to keep replicas up to date.[4] Cassandra uses replication to achieve high availability and durability. Each data item is replicated at N (configurable replication factor) hosts . Each key, k, is assigned to a coordinator node. The coordinator takes care of the replication.Coordinator stores the data locally and also replicates the data at the N-1 nodes. Cassandra provides configurable replication policies like “Rack Unaware”, “Rack Aware” (within a datacenter) and “Datacenter Aware”.Cassandra system elects a leader amongst its nodes using a system called Zookeeper. A participating node in the cluster contacts the leader, who intern tells them for what data ranges they are replicas for. The metadata about the ranges a node is responsible is stored locally at each node and inside Zookeeper. So when a node crashes and comes back up knows what ranges it was responsible for. All nodes aware of every other node in the system and hence the range they are responsible for. Fig. 8 Cassandra Data Flow There are three types of read requests that a coordinator sends to replicas. a) Direct request b) Digest request c) Read repair request ● The coordinator sends direct request to one of the replicas. ● After that, the coordinator sends the digest request to the number of replicas specified by the consistency level and checks whether the returned data is an updated data. ● After that, the coordinator sends digest request to all the remaining replicas. If any node gives out of date value, a background read repair request will update that data. This process is called read repair mechanism. V​. OUR PROPOSED SYSTEM DESIGN Our implementation, inspired by Dynamo system[3], proposes a slightly different approach. We have disintegrated the data nodes from the event bus and coordinator system. We researched several Scalable System Design Patterns​[2] and found out " event-based architecture" model is most suitable to address this problem .Event-based architecture supports several communication styles: ● Publish-subscribe ● Broadcast ● Point-to-point Publish-subscribe communication style decouples sender & receiver and facilitates asynchronous communication. Event-driven architecture (EDA) promotes the production, detection, consumption of, and reaction to events[10]. The main advantage of this architecture is that they are loosely coupled.
  • 5. Fig. 9 Event Driven Architectural Model A​. Client sends the write query to a write coordinator system . Co- ordinator system pushes data into a queue against a particular topic. @RequestMapping(method = RequestMethod.POST,  consumes = {"application/x-www-form-urlencoded"},  value = "/api/savedata")  public String saveData(@RequestParam Map<String,  String> savequery){  String key = savequery.get("key");  String value = savequery.get("value");  kafkaTemplate.send("savedata",  dataFormattor(key,value));  return "Data successfully Saved";  } B. All the participating systems subscribes to that topic and listens to the changes. Each subscriber belongs to a different group , and they processes write queries parallely and independent of each other. Saves the changes to local file system. At this point we can send write acknowledgement back to client if data is written to at least one node or wait for ack from all the nodes , thus enforcing strong consistency. @KafkaListener(topics = "savedata", groupId =          "${diskpath.property}")  public void saveDataToDisk(String message) throws  IOException{  String data[] = dataDeserializer(message);  File file = new File(diskpath+"/"+data[0]);  file.getParentFile().mkdirs();  FileWriter writer = new FileWriter(file);  try {  writer.write(data[1]);  } catch(Exception e){  logger.info(e.getMessage());  }finally{  writer.close();  }  }  C. When a read query comes to a participating node , instead of directly replying to the query , it constitutes a quorum , patiently waits for all the participating node to catch up and then returns the result back to client if all the Node agrees to same data . Thus enforcing strong consistency . Here we can send back a reply even if majority of the nodes agrees to the result , resulting in successful elimination of byzantine faults. @KafkaListener(topics = "retrievedata", groupId =  "coordinator")  public void retrieveValue(String message) throws  FileNotFoundException, IOException{  String data[] = dataDeserializer(message);  //Read data from all the participating nodes  BufferedInputStream reader1 = new  BufferedInputStream(new  FileInputStream(diskpath+"/"+data[1]) );  BufferedInputStream reader2 = new  BufferedInputStream(new  FileInputStream(machine1+"/"+data[1]) );  BufferedInputStream reader3 = new  BufferedInputStream(new  FileInputStream(machine2+"/"+data[1]) );  boolean running = true;  while( running ) {  // Wait for all the nodes to catchup .  if( reader1.available() > 0 && reader2.available() > 0 &&  reader3.available() > 0) {  String val1 = IOUtils.toString(reader1, "UTF-8");  String val2 = IOUtils.toString(reader2, "UTF-8");  String val3 = IOUtils.toString(reader3, "UTF-8");  //Constitute a quorum . All 3 nodes should match.  Enforce Strong consistency  if(val1.equals(val2) && val2.equals(val3)){  webSocket.convertAndSend("/topic/backToClient/"+data[0],  val1);  running = false;  } else {  webSocket.convertAndSend("/topic/backToClient/"+data[0],  "Nodes give different Data");  running = false;  }  }else {  try {  Thread.sleep(150);  }catch( InterruptedException ex ) {  running = false;  }  }  }  }     
  • 6. In essence this proposed architecture looks as follows. Fig. 10 Talon Store Architecture. VI. RESULTS AND DISCUSSION We tested our system with multiple parallel client requests and a dual partition queue topic with 2 brokers . And received following metrics in a macOs Mojave (MacBook Pro 2017 , 2.3 GHz Intel Core i5 , 8 GB 2133 MHz LPDDR3 ) Our results shows this architecture performs marginally faster compared to cassandra under similar load , but much of this data can be influenced by the fact that we tested on a single standalone system instead of a actual distributed network of the system. However proposed design is much more decoupled where we can tweak and configure each blocks of the system separately. Write query performance Read query performance start.time, end.time indicates experiment start and end time. fetch.size - Amount of data to fetch in a single request. data.consumed.in.MB - Size of all messages consumed. ***MB.sec* - Data transferred in MB per sec(Throughput on size). data.consumed.in.nMsg - Count of the total message which was consumed during this test. nMsg.sec - How many messages consumed in a sec(Throughput on the count of messages). VII​. FUTURE​ Work In this literature we only discussed about how we can detect byzantine faults and didn’t actually correct them. How ever this can be easily addressed the way Cassandra solves this. Once the read coordinator detects the faulty node, it can send a data repair request to faulty node and that node in turn fix the corrupt data by using GOSSIP protocol with other participating node. Also for this research we implemented web based client and web based client can not have a socket connection with any server running on different domain due to CORS issues, so we had to serve read queries from from the Rest Controller which receives the query. We can easily bypass this limitation in a non web based client . VIII. APPENDIX Project Final Demo Link https://www.youtube.com/watch?v=0jBl7rOrQiU Source code - ​https://github.com/sap9433/TalonSystems ​IX​. ​REFERENCES [1] Shvachko, K., Kuang, H., Radia, S. and Chansler, R. (2018). ​The Hadoop Distributed File System​. [online] Storageconference.us. Available at: http://storageconference .us/2010/Papers/MSST/Shvachko.pdf [Accessed 28 Nov. 2018]. [2] Kreps, J., Narkhede, N. and Rao, J. (2018). ​Kafka: a Distributed Messaging System for Log Processing​. [online] Notes.stephenholiday.com. Available at: http://notes.stephen holiday.com/Kafka.pdf [Accessed 28 Nov. 2018]. [3] DeCandia, G., Sivasubramanian, S., Lakshman, A. and Hastorun, D. (2018). ​Dynamo: Amazon’s Highly Available Key-value Store​. [online] Courses.cse.tamu.edu. Available at: http://courses.cse.tamu.edu/caverlee/csce438/readings/dy namo-paper.pdf [Accessed 28 Nov. 2018]. [4] Lakshman, A. and Malik, P. (2018). Cassandra - A Decentralized Structured Storage System. ​cs.cornell.edu​. [online] Available at: https://www.cs.cornell.edu/projects/ ladis2009/papers/lakshman-ladis2009.pdf [Accessed 28 Nov. 2018]. [5] "Multi-Master Replication." MySQL at Twitter: No More Forkin' - Migrating to MySQL Community Version |
  • 7. Percona Live - Open Source Database Conference 2018. Accessed November 26, 2018. https://www.percona.com /doc/percona-xtradb-cluster/LATEST/features/multimaster-r eplication.html. [6] "Database Advanced Replication." Master Replication Concepts and Architecture. August 01, 2008. Accessed November 26, 2018. https://docs.oracle.com/cd/B28359 _01/server.111/b28326/repmaster.htm#sthref144. [7] Narkhede, N. (2018). Hands-free Kafka Replication: A lesson in operational simplicity. ​confluent​. [online] Available at: https://www.confluent.io/blog/hands-free- kafka-replication-a-lesson-in-operational-simplicity/[Access ed 28 Nov. 2018]. [8] Fabio Erculiani. "Google/mysql-tools." GitHub. Accessed December 06, 2018. https://github.com/ google/mysql-tools/wiki/Semi-Sync-Replication-Design. [9]​"Database Advanced Replication." Master Replication Concepts and Architecture. August 01, 2008. Accessed November 26, 2018. https://docs.oracle.com/ cd/B28359_01/server.111/b28326/repmaster.htm#sthref144 [10] ​Dr. Tong Lai Yu. "Distributed Systems Architecture." Distributed Systems Architecture. Accessed December 06, 2018. http://cse.csusb.edu/tongyu/courses/cs660/notes/distarch.php [11] "Replication." In-Memory Storage Engine - MongoDB Manual. Accessed December 06, 2018. https://docs.mongo db.com/manual/replication/. [12]Bakshi, Ashish. "Hadoop Distributed File System | Apache Hadoop HDFS Architecture | Edureka." Edureka Blog. December 05, 2018. Accessed December 06, 2018. https://www.edureka.co/blog/apache-hadoop-hdfs-architectu re/.