1. Introduction
System Design
Performance
Conclusions
MoSQL: An Elastic Storage Engine For MySQL
Alexander Tomic, Daniele Sciascia, Fernando Pedone
University of Lugano, Switzerland
March 20, 2013
ACM SAC 2013 - Dependable and Distributed Systems Track
1/17
3. Introduction
System Design
Performance
Conclusions
MySQL is a popular open-source RDBMS at the core of
many web-based applications (part of “LAMP” stack)
Typical approaches to scaling MySQL in the wild (e.g.
sharding, asynchronous replication) provide weak
guarantees and are inflexible1
Elasticity highly desirable in a cyclical world where
over-provisioning and energy costs are significant
Strong guarantees (serializability) make development much
easier
1
Though since original master’s thesis in Sept 2011 some commercial
offerings have attempted to remedy this. Details in appendix
3/17
5. Introduction
System Design
Performance
Conclusions
What do we define as “elastic”?
Add/remove servers to/from a running system
Ideally little performance impact
Get Good Things like higher throughput, reduced latency,
increased system capacity
4/17
6. Introduction
System Design
Performance
Conclusions
SQL (90’s) -> NoSQL (00’s) -> NewSQL (10’s)
SQL transactions are great, but legacy RDBMS architectures
too slow and inflexible
“NoSQL” systems of various flavours attempted to fill the
void (Dynamo, BigTable, etc.), but pushed significant
complexity up to app. developers
Re-emergence of (semi-)relational model in contemporary
systems such as Spanner and Megastore (Google)
Ultimately, no panacea but the usual game of tradeoffs
5/17
8. Introduction
System Design
Performance
Conclusions
MySQL Servers
Storage Nodes
Certifier
MySQL Servers
MySQL has a storage engine
interface enabling different
storage strategies to be
implemented
Serves as a translator from SQL
-> our storage layer API
Multiple MySQL “servers” can
be connected arbitrarily to
storage nodes
7/17
10. Introduction
System Design
Performance
Conclusions
MySQL Servers
Storage Nodes
Certifier
Certifier
Checks whether entries read by
committing update tx are
up-to-date at time of commit
Propagates new entries created
by committing tx to nodes
Read-only tx do not require
certification; updates proceed
optimistically
9/17
17. Introduction
System Design
Performance
Conclusions
Future Work
Appendix: Similar Offerings to MoSQL
Appendix: B+Tree Details
Future Work
Support for different Paxos implementations (experiments
shown use multicast ring-paxos which is of limited use in
“cloud” environments)
Parititioned certification
Usability improvements
We are in the process of open-sourcing MoSQL! Project
page will be updated in the coming weeks:
http://dslab.inf.usi.ch/mosql
16/17
19. Introduction
System Design
Performance
Conclusions
Future Work
Appendix: Similar Offerings to MoSQL
Appendix: B+Tree Details
Related Work
ElasTraS (UCSB): Elastic data store providing transactional
multi-key access to data
ecStore (NU Singapore): peer-to-peer elastic storage with
range-query and tx support; neither ecStore nor ElasTraS
support full SQL transactions
Spanner (Google): Semi-relational model with wide-area tx,
but depends on specialized hardware providing
globally-meaningful timestamps
Megastore (Google): Semi-relational wide-area tx but with
low latency within small partitions; 2PC used for
cross-partition tx
18/17
20. Introduction
System Design
Performance
Conclusions
Future Work
Appendix: Similar Offerings to MoSQL
Appendix: B+Tree Details
MySQL Specific
GenieDB: A storage engine for MySQL with a geo-replicated
storage layer. Does not appear to offer elasticity.
Xeround: A cloud database service for MySQL applications
promising elastic storage for MySQL. ACID-compliance is
provided through a quorum-based approach based on a
quick look at the patent and whitepaper they have available
for download.
Parelastic: Claim many of the features that MoSQL provides
including elasticity. I would have to register in order to get
the whitepaper, but looking at the patent they have received,
it looks superficially like some kind of middleware approach
not unlike Sprint.
19/17
21. Introduction
System Design
Performance
Conclusions
Future Work
Appendix: Similar Offerings to MoSQL
Appendix: B+Tree Details
MySQL “Compatibile”
Clustrix: Shared-nothing system claiming MySQL
compatibility and acid-compliance. Engine written from
bottom up to be distributed, using push-down of compiled
query fragments to individual nodes, enabling apparently
better concurrency.
Scalebase: Another example of Sprint-like middleware that
resides between the application and “demoted” RDBMS
nodes and manage transactions and the distribution of data
across nodes.
Intalio: Claims elastic scalability and compatibilty with a
number of different RDBMS systems, so it would appear to
be some sort of Sprint-like middleware, but details are a bit
scarce.
20/17
22. Introduction
System Design
Performance
Conclusions
Future Work
Appendix: Similar Offerings to MoSQL
Appendix: B+Tree Details
B+Tree and Row Data
Boxes a) - i) are key-values.
100 120 /
100 105 120 12595 / / /
95
<raw data>
100
<raw data>
105
<raw data>
120
<raw data>
125
<raw data>
(a)
(b) (c) (d)
(e) (f) (g) (h) (i)
21/17
23. Introduction
System Design
Performance
Conclusions
Future Work
Appendix: Similar Offerings to MoSQL
Appendix: B+Tree Details
Some Unnecessary Aborts
Consider concurrent tx:
t1 = INSERT .. (60) and t2 = INSERT .. (130).
Writesets of t1, t2 are (a), and (a, d), so t1 will be aborted if
certified after t2.
100 120 /
100 105 120 12595 / / /
95
<raw data>
100
<raw data>
105
<raw data>
120
<raw data>
125
<raw data>
(a)
(b) (c) (d)
(e) (f) (g) (h) (i)
22/17