The document describes Megastore, a scalable and highly available storage system for interactive services. Megastore provides ACID semantics through entity groups and uses a modified Paxos algorithm for synchronous replication across groups. It scales through data partitioning and ensures availability by replicating write-ahead logs within entity groups. The system aims to balance the easy usability of relational databases with the scalability of NoSQL systems.
1. Megastore
Providing Scalable, Highly Available
Storage for Interactive Services
Paper by Jason Barker et al.
Presented by Arinto Murdopo
arinto@kth.se
7/11/2012 1
2. Outline
Motivation
Megastore:
Features
Scalability
Availabilty
Putting them all together
Observation
Conclusions
7/11/2012 2
3. Motivation
Conflicting requirements
• RDBMS – easy to use, but not scale
• NoSQL – scale, but not easy to use
Interactive online services
• Highly available and fast response time
7/11/2012 3
4. Here comes Megastore
easy to use
• ACID semantics
scalable
• data partitioning
highly available
• synchronous replication through modified
Paxos
7/11/2012 4
5. Easy to use - Features
cost-transparent APIs
• No API for joins
• Joins are implemented in application code
data model
• schema, table (entity), property
• entity clustering
• indexes: local, global
• Bigtable column name == Megastore table
name and property name, i.e User.name
7/11/2012 5
6. Easy to use - Features
transactions and concurrency control
• Bigtable for concurrency control
• transaction lifecycle: read, application logic,
commit, apply, clean up
others
• backup system of transaction logs
• encryption
7/11/2012 6
7. Scalable
Scale the replication scheme
Data partitioning
• Entity group concept
Data locality
• Entity group locality
• Bigtable instances locality
7/11/2012 7
8. Entity Groups
Entity is like instance of table.
Entity group is group of entities. i.e
Email Application
• Email account
Blog Application
• User Profile
• Blog post + metadata
• Blog unique name
7/11/2012 8
12. Modified Paxos – Fast Reads
Contact Coordinator and read locally if possible
7/11/2012 12
13. Modified Paxos – Fast Writes
Skip “prepare” stage in subsequent write of same
leader, provided no write from other writers
7/11/2012 13
14. Modified Paxos – New Replica
Types
Full Replicas
all replicas that we have seen until now
Witness Replicas
are able to vote
store but do not apply write-ahead logs
do not store entity data
Read-only Replicas
are not able to vote
snapshots of entity data
7/11/2012 14
20. Conclusion
Megastore and its motivation
Features of megastore
• It has ACID semantics
• But need to define entity groups
• Need to handle inter-group updates
Scalability and Availability
More experiments are needed
7/11/2012 20