SlideShare una empresa de Scribd logo
1 de 20
Descargar para leer sin conexión
Megastore

            Providing Scalable, Highly Available
              Storage for Interactive Services


                                    Paper by Jason Barker et al.
                                   Presented by Arinto Murdopo
                                                 arinto@kth.se
7/11/2012                                                    1
Outline
Motivation
Megastore:
   Features
   Scalability
   Availabilty
   Putting them all together
Observation
Conclusions



7/11/2012                       2
Motivation
   Conflicting requirements
       •    RDBMS – easy to use, but not scale
       •    NoSQL – scale, but not easy to use

   Interactive online services
       •    Highly available and fast response time




7/11/2012                                             3
Here comes Megastore
 easy to use
     • ACID semantics

 scalable
     • data partitioning

 highly available
     • synchronous replication through modified
       Paxos




7/11/2012                                         4
Easy to use - Features
cost-transparent APIs
     • No API for joins
     • Joins are implemented in application code

data model
     •      schema, table (entity), property
     •      entity clustering
     •      indexes: local, global
     •      Bigtable column name == Megastore table
            name and property name, i.e User.name



7/11/2012                                             5
Easy to use - Features
transactions and concurrency control
     • Bigtable for concurrency control
     • transaction lifecycle: read, application logic,
       commit, apply, clean up

others
     • backup system of transaction logs
     • encryption




7/11/2012                                                6
Scalable
Scale the replication scheme
 Data partitioning
   • Entity group concept

 Data locality
   • Entity group locality
   • Bigtable instances locality




7/11/2012                          7
Entity Groups
Entity is like instance of table.
Entity group is group of entities. i.e
  Email Application
            • Email account

     Blog Application
            • User Profile
            • Blog post + metadata
            • Blog unique name




7/11/2012                                8
Entity Groups




7/11/2012       9
Highly Available
Replicate mutations of write-ahead log inside entity groups
using modified Paxos, but let’s revisit original Paxos…




7/11/2012                                                10
Modified Paxos – Fast Reads
Read in original Paxos




7/11/2012                 11
Modified Paxos – Fast Reads
Contact Coordinator and read locally if possible




7/11/2012                                          12
Modified Paxos – Fast Writes
Skip “prepare” stage in subsequent write of same
leader, provided no write from other writers




7/11/2012                                          13
Modified Paxos – New Replica
Types
   Full Replicas
        all replicas that we have seen until now

   Witness Replicas
        are able to vote
        store but do not apply write-ahead logs
        do not store entity data

   Read-only Replicas
        are not able to vote
        snapshots of entity data

7/11/2012                                           14
Putting them all together
Megastore Architecture




7/11/2012                   15
Reads

  Query Local


  Find Position



    Catchup




     Validate
   Query Data


7/11/2012         16
Writes

   Accept Leader


       Prepare



            Accept


      Invalidate

            Apply


7/11/2012            17
Observation - Availability




7/11/2012                    18
Observation – Latency




7/11/2012               19
Conclusion
   Megastore and its motivation

   Features of megastore
      •     It has ACID semantics
      •     But need to define entity groups
      •     Need to handle inter-group updates

   Scalability and Availability

   More experiments are needed




7/11/2012                                        20

Más contenido relacionado

La actualidad más candente

Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
Chandler Huang
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
Derek Stainer
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
Sean Murphy
 
Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theorem
Grisha Weintraub
 

La actualidad más candente (20)

From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
 
Big table
Big tableBig table
Big table
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
 
Percona XtraDB Cluster ( Ensure high Availability )
Percona XtraDB Cluster ( Ensure high Availability )Percona XtraDB Cluster ( Ensure high Availability )
Percona XtraDB Cluster ( Ensure high Availability )
 
Percona XtraDB Cluster
Percona XtraDB ClusterPercona XtraDB Cluster
Percona XtraDB Cluster
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
How Pulsar Stores Your Data - Pulsar Summit NA 2021
How Pulsar Stores Your Data - Pulsar Summit NA 2021How Pulsar Stores Your Data - Pulsar Summit NA 2021
How Pulsar Stores Your Data - Pulsar Summit NA 2021
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
 
Thousands of Threads and Blocking I/O
Thousands of Threads and Blocking I/OThousands of Threads and Blocking I/O
Thousands of Threads and Blocking I/O
 
data platform on kubernetes
data platform on kubernetesdata platform on kubernetes
data platform on kubernetes
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
 
Cassandra internals
Cassandra internalsCassandra internals
Cassandra internals
 
An Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseAn Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed Database
 
Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theorem
 

Destacado

Google Megastore
Google MegastoreGoogle Megastore
Google Megastore
bergwolf
 
Parts of Speech
Parts of SpeechParts of Speech
Parts of Speech
Jen Lawson
 
Cultura mites
Cultura mitesCultura mites
Cultura mites
Comalat1D
 
153 test plan
153 test plan153 test plan
153 test plan
< <
 
Moodboards eda
Moodboards edaMoodboards eda
Moodboards eda
edaozdemir
 
Practica 2 luis ivan cruz val.
Practica 2 luis ivan cruz val.Practica 2 luis ivan cruz val.
Practica 2 luis ivan cruz val.
persi-10
 

Destacado (20)

Google Megastore
Google MegastoreGoogle Megastore
Google Megastore
 
MORE Mega Store .........
MORE Mega Store .........MORE Mega Store .........
MORE Mega Store .........
 
Megastore providing scalable, highly available storage for interactive services
Megastore providing scalable, highly available storage for interactive servicesMegastore providing scalable, highly available storage for interactive services
Megastore providing scalable, highly available storage for interactive services
 
Cassandra Compression and Performance Evaluation
Cassandra Compression and Performance EvaluationCassandra Compression and Performance Evaluation
Cassandra Compression and Performance Evaluation
 
Db presentation google_megastore
Db presentation google_megastoreDb presentation google_megastore
Db presentation google_megastore
 
Megastore
MegastoreMegastore
Megastore
 
Noha mega store
Noha mega storeNoha mega store
Noha mega store
 
An Integer Programming Representation for Data Center Power-Aware Management ...
An Integer Programming Representation for Data Center Power-Aware Management ...An Integer Programming Representation for Data Center Power-Aware Management ...
An Integer Programming Representation for Data Center Power-Aware Management ...
 
Distributed Computing - What, why, how..
Distributed Computing - What, why, how..Distributed Computing - What, why, how..
Distributed Computing - What, why, how..
 
Parts of Speech
Parts of SpeechParts of Speech
Parts of Speech
 
Facebook
FacebookFacebook
Facebook
 
how to say foods and drinks in japanese
how to say foods and drinks in japanesehow to say foods and drinks in japanese
how to say foods and drinks in japanese
 
Cultura mites
Cultura mitesCultura mites
Cultura mites
 
153 test plan
153 test plan153 test plan
153 test plan
 
Sam houston chess team
Sam houston chess teamSam houston chess team
Sam houston chess team
 
Moodboards eda
Moodboards edaMoodboards eda
Moodboards eda
 
UX homework4
UX homework4UX homework4
UX homework4
 
Practica 2 luis ivan cruz val.
Practica 2 luis ivan cruz val.Practica 2 luis ivan cruz val.
Practica 2 luis ivan cruz val.
 
Netcare csi kelvin's talk aug 2015
Netcare csi kelvin's talk aug 2015Netcare csi kelvin's talk aug 2015
Netcare csi kelvin's talk aug 2015
 
Pechakucha
PechakuchaPechakucha
Pechakucha
 

Similar a Megastore - ID2220 Presentation

An Active and Hybrid Storage System for Data-intensive Applications
An Active and Hybrid Storage System for Data-intensive ApplicationsAn Active and Hybrid Storage System for Data-intensive Applications
An Active and Hybrid Storage System for Data-intensive Applications
Xiao Qin
 
Database management-system
Database management-systemDatabase management-system
Database management-system
kalasalingam
 
Hpts 2011 flexible_oltp
Hpts 2011 flexible_oltpHpts 2011 flexible_oltp
Hpts 2011 flexible_oltp
Jags Ramnarayan
 
Yieldbot Tech Talk, Sept 20, 2012
Yieldbot Tech Talk, Sept 20, 2012Yieldbot Tech Talk, Sept 20, 2012
Yieldbot Tech Talk, Sept 20, 2012
yieldbot
 
Oracle to MySQL 2012
Oracle to MySQL  2012 Oracle to MySQL  2012
Oracle to MySQL 2012
Marco Tusa
 

Similar a Megastore - ID2220 Presentation (20)

Data warehousing
Data warehousingData warehousing
Data warehousing
 
An Active and Hybrid Storage System for Data-intensive Applications
An Active and Hybrid Storage System for Data-intensive ApplicationsAn Active and Hybrid Storage System for Data-intensive Applications
An Active and Hybrid Storage System for Data-intensive Applications
 
Flowdock's full-text search with MongoDB
Flowdock's full-text search with MongoDBFlowdock's full-text search with MongoDB
Flowdock's full-text search with MongoDB
 
Dao benchmark
Dao benchmarkDao benchmark
Dao benchmark
 
Silicon valley nosql meetup april 2012
Silicon valley nosql meetup  april 2012Silicon valley nosql meetup  april 2012
Silicon valley nosql meetup april 2012
 
Drupal and the rise of the documents
Drupal and the rise of the documentsDrupal and the rise of the documents
Drupal and the rise of the documents
 
Oracle mysql comparison
Oracle mysql comparisonOracle mysql comparison
Oracle mysql comparison
 
Database management-system
Database management-systemDatabase management-system
Database management-system
 
Sql no sql
Sql no sqlSql no sql
Sql no sql
 
Hpts 2011 flexible_oltp
Hpts 2011 flexible_oltpHpts 2011 flexible_oltp
Hpts 2011 flexible_oltp
 
Samba management Console
Samba management ConsoleSamba management Console
Samba management Console
 
Introduction To J Boss Seam
Introduction To J Boss SeamIntroduction To J Boss Seam
Introduction To J Boss Seam
 
Complex Legacy System Archiving/Data Retention with MongoDB and Xquery
Complex Legacy System Archiving/Data Retention with MongoDB and XqueryComplex Legacy System Archiving/Data Retention with MongoDB and Xquery
Complex Legacy System Archiving/Data Retention with MongoDB and Xquery
 
MongoDB at Sailthru: Scaling and Schema Design
MongoDB at Sailthru: Scaling and Schema DesignMongoDB at Sailthru: Scaling and Schema Design
MongoDB at Sailthru: Scaling and Schema Design
 
Webinar: Applying REST to Network Management – An Implementor’s View
Webinar: Applying REST to Network Management – An Implementor’s View Webinar: Applying REST to Network Management – An Implementor’s View
Webinar: Applying REST to Network Management – An Implementor’s View
 
Yieldbot Tech Talk, Sept 20, 2012
Yieldbot Tech Talk, Sept 20, 2012Yieldbot Tech Talk, Sept 20, 2012
Yieldbot Tech Talk, Sept 20, 2012
 
Enterprise Java in 2012 and Beyond, by Juergen Hoeller
Enterprise Java in 2012 and Beyond, by Juergen Hoeller Enterprise Java in 2012 and Beyond, by Juergen Hoeller
Enterprise Java in 2012 and Beyond, by Juergen Hoeller
 
Oracle to MySQL 2012
Oracle to MySQL  2012 Oracle to MySQL  2012
Oracle to MySQL 2012
 
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
 
Geode Meetup Apachecon
Geode Meetup ApacheconGeode Meetup Apachecon
Geode Meetup Apachecon
 

Más de Arinto Murdopo

Más de Arinto Murdopo (20)

Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data Streams
 
Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data Streams
 
Next Generation Hadoop: High Availability for YARN
Next Generation Hadoop: High Availability for YARN Next Generation Hadoop: High Availability for YARN
Next Generation Hadoop: High Availability for YARN
 
High Availability in YARN
High Availability in YARNHigh Availability in YARN
High Availability in YARN
 
An Integer Programming Representation for Data Center Power-Aware Management ...
An Integer Programming Representation for Data Center Power-Aware Management ...An Integer Programming Representation for Data Center Power-Aware Management ...
An Integer Programming Representation for Data Center Power-Aware Management ...
 
Quantum Cryptography and Possible Attacks-slide
Quantum Cryptography and Possible Attacks-slideQuantum Cryptography and Possible Attacks-slide
Quantum Cryptography and Possible Attacks-slide
 
Quantum Cryptography and Possible Attacks
Quantum Cryptography and Possible AttacksQuantum Cryptography and Possible Attacks
Quantum Cryptography and Possible Attacks
 
Parallelization of Smith-Waterman Algorithm using MPI
Parallelization of Smith-Waterman Algorithm using MPIParallelization of Smith-Waterman Algorithm using MPI
Parallelization of Smith-Waterman Algorithm using MPI
 
Dremel Paper Review
Dremel Paper ReviewDremel Paper Review
Dremel Paper Review
 
Flume Event Scalability
Flume Event ScalabilityFlume Event Scalability
Flume Event Scalability
 
Large Scale Distributed Storage Systems in Volunteer Computing - Slide
Large Scale Distributed Storage Systems in Volunteer Computing - SlideLarge Scale Distributed Storage Systems in Volunteer Computing - Slide
Large Scale Distributed Storage Systems in Volunteer Computing - Slide
 
Large-Scale Decentralized Storage Systems for Volunter Computing Systems
Large-Scale Decentralized Storage Systems for Volunter Computing SystemsLarge-Scale Decentralized Storage Systems for Volunter Computing Systems
Large-Scale Decentralized Storage Systems for Volunter Computing Systems
 
Rise of Network Virtualization
Rise of Network VirtualizationRise of Network Virtualization
Rise of Network Virtualization
 
Intelligent Placement of Datacenter for Internet Services
Intelligent Placement of Datacenter for Internet Services Intelligent Placement of Datacenter for Internet Services
Intelligent Placement of Datacenter for Internet Services
 
Architecting a Cloud-Scale Identity Fabric
Architecting a Cloud-Scale Identity FabricArchitecting a Cloud-Scale Identity Fabric
Architecting a Cloud-Scale Identity Fabric
 
Consistency Tradeoffs in Modern Distributed Database System Design
Consistency Tradeoffs in Modern Distributed Database System DesignConsistency Tradeoffs in Modern Distributed Database System Design
Consistency Tradeoffs in Modern Distributed Database System Design
 
Distributed Storage System for Volunteer Computing
Distributed Storage System for Volunteer ComputingDistributed Storage System for Volunteer Computing
Distributed Storage System for Volunteer Computing
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Why File Sharing is Dangerous?
Why File Sharing is Dangerous?Why File Sharing is Dangerous?
Why File Sharing is Dangerous?
 
Why Use “REST” Architecture for Web Services?
Why Use “REST” Architecture for Web Services?Why Use “REST” Architecture for Web Services?
Why Use “REST” Architecture for Web Services?
 

Último

Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 

Último (20)

Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 

Megastore - ID2220 Presentation

  • 1. Megastore Providing Scalable, Highly Available Storage for Interactive Services Paper by Jason Barker et al. Presented by Arinto Murdopo arinto@kth.se 7/11/2012 1
  • 2. Outline Motivation Megastore:  Features  Scalability  Availabilty  Putting them all together Observation Conclusions 7/11/2012 2
  • 3. Motivation  Conflicting requirements • RDBMS – easy to use, but not scale • NoSQL – scale, but not easy to use  Interactive online services • Highly available and fast response time 7/11/2012 3
  • 4. Here comes Megastore easy to use • ACID semantics scalable • data partitioning highly available • synchronous replication through modified Paxos 7/11/2012 4
  • 5. Easy to use - Features cost-transparent APIs • No API for joins • Joins are implemented in application code data model • schema, table (entity), property • entity clustering • indexes: local, global • Bigtable column name == Megastore table name and property name, i.e User.name 7/11/2012 5
  • 6. Easy to use - Features transactions and concurrency control • Bigtable for concurrency control • transaction lifecycle: read, application logic, commit, apply, clean up others • backup system of transaction logs • encryption 7/11/2012 6
  • 7. Scalable Scale the replication scheme  Data partitioning • Entity group concept  Data locality • Entity group locality • Bigtable instances locality 7/11/2012 7
  • 8. Entity Groups Entity is like instance of table. Entity group is group of entities. i.e Email Application • Email account Blog Application • User Profile • Blog post + metadata • Blog unique name 7/11/2012 8
  • 10. Highly Available Replicate mutations of write-ahead log inside entity groups using modified Paxos, but let’s revisit original Paxos… 7/11/2012 10
  • 11. Modified Paxos – Fast Reads Read in original Paxos 7/11/2012 11
  • 12. Modified Paxos – Fast Reads Contact Coordinator and read locally if possible 7/11/2012 12
  • 13. Modified Paxos – Fast Writes Skip “prepare” stage in subsequent write of same leader, provided no write from other writers 7/11/2012 13
  • 14. Modified Paxos – New Replica Types  Full Replicas  all replicas that we have seen until now  Witness Replicas  are able to vote  store but do not apply write-ahead logs  do not store entity data  Read-only Replicas  are not able to vote  snapshots of entity data 7/11/2012 14
  • 15. Putting them all together Megastore Architecture 7/11/2012 15
  • 16. Reads Query Local Find Position Catchup Validate Query Data 7/11/2012 16
  • 17. Writes Accept Leader Prepare Accept Invalidate Apply 7/11/2012 17
  • 20. Conclusion  Megastore and its motivation  Features of megastore • It has ACID semantics • But need to define entity groups • Need to handle inter-group updates  Scalability and Availability  More experiments are needed 7/11/2012 20