SlideShare una empresa de Scribd logo
1 de 24
Scaling databases on the cloud

                                                                  D e e p a k A n u p a l l i
                                                                  S e r v e r A r c h i t e c t

                                 C L O U D               C O M P U T I N G - C O M I N G                          O F    A G E

                             A      T R E A T I S E                    O N         R E A L - L I F E        U S E       C A S E S




Copyright (c) 2009, Pramati Technologies Private Limited. Imaginea is a Pramati business. All
trade names and trade marks are owned by their respective owners
                                                                                                11/4/2009     1
We are
 •   An emerging leader in product
     development services offering
     specialized services in Product
     Engineering, Interaction design
     and Test engineering.
 •   US Headquarters in Sunnyvale,
     CA; India development centers in
     Hyderabad and Chennai
 •   A 250+ strong and growing team
 •   A business unit of Pramati
     technologies
 •   Rich Experience in SaaS
     Engineering, Performance
     engineering, Cloud Computing,
     Web2.0, sf.com integrations and
     managing Amazon EC2
     Deployment
 •   Track record of delivering
     significant customer satisfaction
Initiatives in Cloud
• Dekoh:
  http://www.dekoh.com
• SocialTwist:
  http://www.socialtwist.com
• MyPicks Beijing 2008:
  http://apps.new.facebook.com/mypicksbeijing/Home
• Qontext:
  http://www.qontext.com
Application requirements

• High reliability
• Low Latency
• Dynamic Scalability
   – Millions of Users
   – Volumes of data
• Across the tiers
   – Web
   – Application
   – Data
Our biggest challenge

• DB Perf bound by Disk I/O
• Vertical scaling is an option
   – Ex: PlentyOfFish.com: 512GB RAM, 32CPUs
   – Expensive
  – Only possible to an extent on cloud servers
Vertical Scaling: Limitations
  • Not everything will fit in
    memory
  • Lot of reads ~ Lot of
    page faults + disk seeks
  • RAID 6 or RAID 10
    disks
  • 200MBps-1GBps is the
    max speed

         Think Horizontal !
Replication
 • Master-slave replication (MySQL
                                             Writes
   or Oracle RAC)
 • Writes on one Master
                                             Master
 • Reads on many Slaves
 • Application aware
 • Works in read mostly scenario             Writes

 • Adds Slave lag
                                     Slave   Slave    Slave


                                              Reads
Sharding
 • Partition data across masters
 • Writes and Reads are distributed                  Shard Logic
 • Application is modified accordingly
 • Also use replication with fewer slaves
   to minimize slave lag                    Master      Master     Master

 • Choose a partitioning strategy that
   uniformly distributes data

                                            Slave       Slave      Slave
Sharding Schemes
 •   Vertical
                                   shard_id = getShard(“profile”)
 •   Profile DB, friend DB         shard_id = getShard(profileID)
 •   Not uniform
                                   Select * from Profile where id = ?
 •   Range based
 •   ID range, Location or Date
     based
 •   Not uniform                     Corporate           Corporate

 •   Key or Hash based
 •   ID hash
 •   Fixed masters
                                  Tweets         Posts
 •   Directory
 •   Mapping of ID to Shard
 •   Single point of failure
Sharding Complexities
 •   No Joins
 •   De-normalize the data
 •   Data Integrity
 •   Application should enforce integrity
 •   Re-shard
 •   Changing the sharding scheme requires re-partitioning
     the entire data
De-normalization
 • Recent 10 messages to a recipient
 • Schema                                   Messages    Recipients
 • Messages Table stores message info
                                            timestamp
 • Recipients Table stores
 • Requires Join on Messages & Recipients
   table
 • De-normalize                             Messages    Recipients

 • Store timestamp in Recipients table as
                                            timestamp   timestamp
   well
Relationships

• When data is partitioned into shards,
  foreign keys become obsolete
• De-normalization avoids having
  relationships                                      Application
• If data can’t be de-normalized further,
  use memcached
• But, this requires change in SQL queries      MemCached


                                             Shard    Shard    Shard
                                               1        2        3
Cloud Databases/Data stores

•   Amazon SimpleDB
•   Google BigTable
•   Apache HBase
•   Facebook/Apache Hive
•   CouchDB
•   Cassandra
•   Many more…
Amazon SimpleDB
•   Schema-less distributed key-value store
•   Highly reliable and scalable
•   Automatic indexing of columns
•   Querying with SQL-like syntax
•   Supports multiple values for key/attribute
•   Value for Money
Problems Addressed
• High Availability
   – multiple nodes forming a ring
• Partitioning
   – Consistent hashing
• Replication
   – Replicated to multiple nodes
• Eventual Consistency
   – Asynchronous replication of data using vector clocks
SimpleDB adoption

•   No Joins
•   No transactional support
•   String is the only data type
•   No aggregator functions
•   No full-text searches
•   Limits enforced on size of results, predicates, data etc.
Google BigTable
•   Distributed Key-value store
•   Runs on top of Google File System (GFS)
•   Timestamp versioned data
•   Automatic indexing of columns
BigTable adoption
• Google Search, Maps, Earth, Orkut, Youtube,
  Reader, etc.
• Google App Engine(GAE) uses BigTable as its
  datastore
• DataNucleus supports JPA for BigTable
• Limited transaction support
• Eventual consistency
Hive
 • Hive is a data warehouse
 • Runs on top of Hadoop Distributed
   File system (HDFS)
 • Supports SQL-like syntax
 • User defined types and functions
 • Extensibility with Map-Reduce
Hive adoption
 • Facebook uses Hive to analyze historical
   data of users and content
 • Doesn’t support indexing of columns
 • Brute force mechanism to compute
   analytics
CouchDB
•   CouchDB is a document-oriented datastore
•   Schema-free
•   Accessible through RESTful JSON API
•   Distributed with incremental replication
•   Querying through Javascript
Is there a solution for all?


• Different data-stores address different problem spaces
• Identify what best suites your app
Thank You
   deepak@pramati.com



http://hysea.in
C L O U D               C O M P U T I N G - C O M I N G                                      O F      A G E

A     T R E A T I S E                    O N        R E A L - L I F E                       U S E     C A S E S



Scaling databases on the cloud



Copyright © 2009, Imaginea Inc. Not to be distributed or communicated without permission.           11/4/2009   24

Más contenido relacionado

La actualidad más candente

Geek Sync | Designing Data Intensive Cloud Native Applications
Geek Sync | Designing Data Intensive Cloud Native ApplicationsGeek Sync | Designing Data Intensive Cloud Native Applications
Geek Sync | Designing Data Intensive Cloud Native Applications
IDERA Software
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
Andrew Brust
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברג
Taldor Group
 

La actualidad más candente (20)

Geek Sync | Designing Data Intensive Cloud Native Applications
Geek Sync | Designing Data Intensive Cloud Native ApplicationsGeek Sync | Designing Data Intensive Cloud Native Applications
Geek Sync | Designing Data Intensive Cloud Native Applications
 
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
 
2015 GHC Presentation - High Availability and High Frequency Big Data Analytics
2015 GHC Presentation - High Availability and High Frequency Big Data Analytics2015 GHC Presentation - High Availability and High Frequency Big Data Analytics
2015 GHC Presentation - High Availability and High Frequency Big Data Analytics
 
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...
 
That ORM is Lying to You
That ORM is Lying to YouThat ORM is Lying to You
That ORM is Lying to You
 
MySql to HBase in 5 Steps
MySql to HBase in 5 StepsMySql to HBase in 5 Steps
MySql to HBase in 5 Steps
 
NoSQL and The Big Data Hullabaloo
NoSQL and The Big Data HullabalooNoSQL and The Big Data Hullabaloo
NoSQL and The Big Data Hullabaloo
 
12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools
 
Big data Intro by Kaushik Dutta
Big data Intro by Kaushik DuttaBig data Intro by Kaushik Dutta
Big data Intro by Kaushik Dutta
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera Impala
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing Meetup
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
 
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברג
 
NoSQL
NoSQLNoSQL
NoSQL
 

Destacado

Vruddha Trihi Samruddha Bestseller For Sexy Aging Dr. Shriniwas Kashalikar
Vruddha Trihi Samruddha Bestseller For Sexy Aging Dr. Shriniwas KashalikarVruddha Trihi Samruddha Bestseller For Sexy Aging Dr. Shriniwas Kashalikar
Vruddha Trihi Samruddha Bestseller For Sexy Aging Dr. Shriniwas Kashalikar
Abhishek Yelgalwar
 
A G I N G G R A C E F U L L Y A N D V I C T O R I U O S L Y D R S H R I ...
A G I N G  G R A C E F U L L Y  A N D  V I C T O R I U O S L Y  D R  S H R I ...A G I N G  G R A C E F U L L Y  A N D  V I C T O R I U O S L Y  D R  S H R I ...
A G I N G G R A C E F U L L Y A N D V I C T O R I U O S L Y D R S H R I ...
Abhishek Yelgalwar
 
D E M O C R A C Y & S T R E S S M A N A G E M E N T D R S H R I N I W A S...
D E M O C R A C Y &  S T R E S S  M A N A G E M E N T  D R  S H R I N I W A S...D E M O C R A C Y &  S T R E S S  M A N A G E M E N T  D R  S H R I N I W A S...
D E M O C R A C Y & S T R E S S M A N A G E M E N T D R S H R I N I W A S...
Abhishek Yelgalwar
 
W H Y H O L I S T I C M E D I C I N E D R S H R I N I W A S K A S H A L ...
W H Y  H O L I S T I C  M E D I C I N E  D R  S H R I N I W A S  K A S H A L ...W H Y  H O L I S T I C  M E D I C I N E  D R  S H R I N I W A S  K A S H A L ...
W H Y H O L I S T I C M E D I C I N E D R S H R I N I W A S K A S H A L ...
ghanyog
 

Destacado (6)

Vruddha Trihi Samruddha Bestseller For Sexy Aging Dr. Shriniwas Kashalikar
Vruddha Trihi Samruddha Bestseller For Sexy Aging Dr. Shriniwas KashalikarVruddha Trihi Samruddha Bestseller For Sexy Aging Dr. Shriniwas Kashalikar
Vruddha Trihi Samruddha Bestseller For Sexy Aging Dr. Shriniwas Kashalikar
 
A G I N G G R A C E F U L L Y A N D V I C T O R I U O S L Y D R S H R I ...
A G I N G  G R A C E F U L L Y  A N D  V I C T O R I U O S L Y  D R  S H R I ...A G I N G  G R A C E F U L L Y  A N D  V I C T O R I U O S L Y  D R  S H R I ...
A G I N G G R A C E F U L L Y A N D V I C T O R I U O S L Y D R S H R I ...
 
D E M O C R A C Y & S T R E S S M A N A G E M E N T D R S H R I N I W A S...
D E M O C R A C Y &  S T R E S S  M A N A G E M E N T  D R  S H R I N I W A S...D E M O C R A C Y &  S T R E S S  M A N A G E M E N T  D R  S H R I N I W A S...
D E M O C R A C Y & S T R E S S M A N A G E M E N T D R S H R I N I W A S...
 
Imaginea_Product Engineering_Services
Imaginea_Product Engineering_ServicesImaginea_Product Engineering_Services
Imaginea_Product Engineering_Services
 
P R A L H A D S A I D D R
P R A L H A D  S A I D  D RP R A L H A D  S A I D  D R
P R A L H A D S A I D D R
 
W H Y H O L I S T I C M E D I C I N E D R S H R I N I W A S K A S H A L ...
W H Y  H O L I S T I C  M E D I C I N E  D R  S H R I N I W A S  K A S H A L ...W H Y  H O L I S T I C  M E D I C I N E  D R  S H R I N I W A S  K A S H A L ...
W H Y H O L I S T I C M E D I C I N E D R S H R I N I W A S K A S H A L ...
 

Similar a Scaling Databases On The Cloud

Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
Don Demcsak
 
Presentation Presentation Presentation Presentation Presentation
Presentation Presentation Presentation Presentation PresentationPresentation Presentation Presentation Presentation Presentation
Presentation Presentation Presentation Presentation Presentation
bangel105
 
Which Database is Right for My Workload?: Database Week San Francisco
Which Database is Right for My Workload?: Database Week San FranciscoWhich Database is Right for My Workload?: Database Week San Francisco
Which Database is Right for My Workload?: Database Week San Francisco
Amazon Web Services
 
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Manik Surtani
 

Similar a Scaling Databases On The Cloud (20)

Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
Presentation Presentation Presentation Presentation Presentation
Presentation Presentation Presentation Presentation PresentationPresentation Presentation Presentation Presentation Presentation
Presentation Presentation Presentation Presentation Presentation
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQL
 
Which Database is Right for My Workload?: Database Week San Francisco
Which Database is Right for My Workload?: Database Week San FranciscoWhich Database is Right for My Workload?: Database Week San Francisco
Which Database is Right for My Workload?: Database Week San Francisco
 
Which Database is Right for My Workload?
Which Database is Right for My Workload?Which Database is Right for My Workload?
Which Database is Right for My Workload?
 
Which Database is Right for My Workload: Database Week SF
Which Database is Right for My Workload: Database Week SFWhich Database is Right for My Workload: Database Week SF
Which Database is Right for My Workload: Database Week SF
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
NoSQL-Overview
NoSQL-OverviewNoSQL-Overview
NoSQL-Overview
 
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Hadoop Data Modeling
Hadoop Data ModelingHadoop Data Modeling
Hadoop Data Modeling
 
AWS Well Architected-Info Session WeCloudData
AWS Well Architected-Info Session WeCloudDataAWS Well Architected-Info Session WeCloudData
AWS Well Architected-Info Session WeCloudData
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
DataFrames: The Extended Cut
DataFrames: The Extended CutDataFrames: The Extended Cut
DataFrames: The Extended Cut
 
Sa introduction to big data pipelining with cassandra & spark west mins...
Sa introduction to big data pipelining with cassandra & spark   west mins...Sa introduction to big data pipelining with cassandra & spark   west mins...
Sa introduction to big data pipelining with cassandra & spark west mins...
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
 
The Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot PersistenceThe Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot Persistence
 

Más de Imaginea

Web application penetration testing
Web application penetration testingWeb application penetration testing
Web application penetration testing
Imaginea
 
Network penetration testing
Network penetration testingNetwork penetration testing
Network penetration testing
Imaginea
 
Imaginea Service Sheet - Performance Engineering
Imaginea Service Sheet - Performance EngineeringImaginea Service Sheet - Performance Engineering
Imaginea Service Sheet - Performance Engineering
Imaginea
 
Imaginea Service Sheet - Interaction Design
Imaginea Service Sheet - Interaction DesignImaginea Service Sheet - Interaction Design
Imaginea Service Sheet - Interaction Design
Imaginea
 
Imaginea - SugarCRM iPhone App - User Guide
Imaginea - SugarCRM iPhone App - User GuideImaginea - SugarCRM iPhone App - User Guide
Imaginea - SugarCRM iPhone App - User Guide
Imaginea
 
Offline Enterprise and Web Apps: Dekoh Approach
Offline Enterprise and Web Apps: Dekoh ApproachOffline Enterprise and Web Apps: Dekoh Approach
Offline Enterprise and Web Apps: Dekoh Approach
Imaginea
 
Imaginea Scales Application using Amazon EC2
Imaginea Scales Application using Amazon EC2Imaginea Scales Application using Amazon EC2
Imaginea Scales Application using Amazon EC2
Imaginea
 

Más de Imaginea (20)

Web application penetration testing
Web application penetration testingWeb application penetration testing
Web application penetration testing
 
Network penetration testing
Network penetration testingNetwork penetration testing
Network penetration testing
 
Require JS
Require JSRequire JS
Require JS
 
Scala and lift
Scala and liftScala and lift
Scala and lift
 
Imaginea Service Sheet - Performance Engineering
Imaginea Service Sheet - Performance EngineeringImaginea Service Sheet - Performance Engineering
Imaginea Service Sheet - Performance Engineering
 
Imaginea Service Sheet - Interaction Design
Imaginea Service Sheet - Interaction DesignImaginea Service Sheet - Interaction Design
Imaginea Service Sheet - Interaction Design
 
Imaginea - SugarCRM iPhone App - User Guide
Imaginea - SugarCRM iPhone App - User GuideImaginea - SugarCRM iPhone App - User Guide
Imaginea - SugarCRM iPhone App - User Guide
 
Offline Enterprise and Web Apps: Dekoh Approach
Offline Enterprise and Web Apps: Dekoh ApproachOffline Enterprise and Web Apps: Dekoh Approach
Offline Enterprise and Web Apps: Dekoh Approach
 
Imaginea Scales Application using Amazon EC2
Imaginea Scales Application using Amazon EC2Imaginea Scales Application using Amazon EC2
Imaginea Scales Application using Amazon EC2
 
Whitepaper Cloud Egovernance Imaginea
Whitepaper Cloud Egovernance ImagineaWhitepaper Cloud Egovernance Imaginea
Whitepaper Cloud Egovernance Imaginea
 
Imaginea - Ideas to Life - About Us
Imaginea - Ideas to Life - About UsImaginea - Ideas to Life - About Us
Imaginea - Ideas to Life - About Us
 
Imaginea_CloudComputing_Services
Imaginea_CloudComputing_ServicesImaginea_CloudComputing_Services
Imaginea_CloudComputing_Services
 
Imaginea Cloud Offerings
Imaginea Cloud OfferingsImaginea Cloud Offerings
Imaginea Cloud Offerings
 
Soa Offerings
Soa OfferingsSoa Offerings
Soa Offerings
 
Sharing on Dekoh - Our RIA Desktop Platform
Sharing on Dekoh - Our RIA Desktop PlatformSharing on Dekoh - Our RIA Desktop Platform
Sharing on Dekoh - Our RIA Desktop Platform
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloud
 
Product QA - A test engineering perspective
Product QA - A test engineering perspectiveProduct QA - A test engineering perspective
Product QA - A test engineering perspective
 
Facebook Olympics
Facebook OlympicsFacebook Olympics
Facebook Olympics
 
Process Guidelines V2
Process Guidelines V2Process Guidelines V2
Process Guidelines V2
 
Migrating to Cloud - A Step by Step
Migrating to Cloud - A Step by Step Migrating to Cloud - A Step by Step
Migrating to Cloud - A Step by Step
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Scaling Databases On The Cloud

  • 1. Scaling databases on the cloud D e e p a k A n u p a l l i S e r v e r A r c h i t e c t C L O U D C O M P U T I N G - C O M I N G O F A G E A T R E A T I S E O N R E A L - L I F E U S E C A S E S Copyright (c) 2009, Pramati Technologies Private Limited. Imaginea is a Pramati business. All trade names and trade marks are owned by their respective owners 11/4/2009 1
  • 2. We are • An emerging leader in product development services offering specialized services in Product Engineering, Interaction design and Test engineering. • US Headquarters in Sunnyvale, CA; India development centers in Hyderabad and Chennai • A 250+ strong and growing team • A business unit of Pramati technologies • Rich Experience in SaaS Engineering, Performance engineering, Cloud Computing, Web2.0, sf.com integrations and managing Amazon EC2 Deployment • Track record of delivering significant customer satisfaction
  • 3. Initiatives in Cloud • Dekoh: http://www.dekoh.com • SocialTwist: http://www.socialtwist.com • MyPicks Beijing 2008: http://apps.new.facebook.com/mypicksbeijing/Home • Qontext: http://www.qontext.com
  • 4. Application requirements • High reliability • Low Latency • Dynamic Scalability – Millions of Users – Volumes of data • Across the tiers – Web – Application – Data
  • 5. Our biggest challenge • DB Perf bound by Disk I/O • Vertical scaling is an option – Ex: PlentyOfFish.com: 512GB RAM, 32CPUs – Expensive – Only possible to an extent on cloud servers
  • 6. Vertical Scaling: Limitations • Not everything will fit in memory • Lot of reads ~ Lot of page faults + disk seeks • RAID 6 or RAID 10 disks • 200MBps-1GBps is the max speed Think Horizontal !
  • 7. Replication • Master-slave replication (MySQL Writes or Oracle RAC) • Writes on one Master Master • Reads on many Slaves • Application aware • Works in read mostly scenario Writes • Adds Slave lag Slave Slave Slave Reads
  • 8. Sharding • Partition data across masters • Writes and Reads are distributed Shard Logic • Application is modified accordingly • Also use replication with fewer slaves to minimize slave lag Master Master Master • Choose a partitioning strategy that uniformly distributes data Slave Slave Slave
  • 9. Sharding Schemes • Vertical shard_id = getShard(“profile”) • Profile DB, friend DB shard_id = getShard(profileID) • Not uniform Select * from Profile where id = ? • Range based • ID range, Location or Date based • Not uniform Corporate Corporate • Key or Hash based • ID hash • Fixed masters Tweets Posts • Directory • Mapping of ID to Shard • Single point of failure
  • 10. Sharding Complexities • No Joins • De-normalize the data • Data Integrity • Application should enforce integrity • Re-shard • Changing the sharding scheme requires re-partitioning the entire data
  • 11. De-normalization • Recent 10 messages to a recipient • Schema Messages Recipients • Messages Table stores message info timestamp • Recipients Table stores • Requires Join on Messages & Recipients table • De-normalize Messages Recipients • Store timestamp in Recipients table as timestamp timestamp well
  • 12. Relationships • When data is partitioned into shards, foreign keys become obsolete • De-normalization avoids having relationships Application • If data can’t be de-normalized further, use memcached • But, this requires change in SQL queries MemCached Shard Shard Shard 1 2 3
  • 13. Cloud Databases/Data stores • Amazon SimpleDB • Google BigTable • Apache HBase • Facebook/Apache Hive • CouchDB • Cassandra • Many more…
  • 14. Amazon SimpleDB • Schema-less distributed key-value store • Highly reliable and scalable • Automatic indexing of columns • Querying with SQL-like syntax • Supports multiple values for key/attribute • Value for Money
  • 15. Problems Addressed • High Availability – multiple nodes forming a ring • Partitioning – Consistent hashing • Replication – Replicated to multiple nodes • Eventual Consistency – Asynchronous replication of data using vector clocks
  • 16. SimpleDB adoption • No Joins • No transactional support • String is the only data type • No aggregator functions • No full-text searches • Limits enforced on size of results, predicates, data etc.
  • 17. Google BigTable • Distributed Key-value store • Runs on top of Google File System (GFS) • Timestamp versioned data • Automatic indexing of columns
  • 18. BigTable adoption • Google Search, Maps, Earth, Orkut, Youtube, Reader, etc. • Google App Engine(GAE) uses BigTable as its datastore • DataNucleus supports JPA for BigTable • Limited transaction support • Eventual consistency
  • 19. Hive • Hive is a data warehouse • Runs on top of Hadoop Distributed File system (HDFS) • Supports SQL-like syntax • User defined types and functions • Extensibility with Map-Reduce
  • 20. Hive adoption • Facebook uses Hive to analyze historical data of users and content • Doesn’t support indexing of columns • Brute force mechanism to compute analytics
  • 21. CouchDB • CouchDB is a document-oriented datastore • Schema-free • Accessible through RESTful JSON API • Distributed with incremental replication • Querying through Javascript
  • 22. Is there a solution for all? • Different data-stores address different problem spaces • Identify what best suites your app
  • 23. Thank You deepak@pramati.com http://hysea.in
  • 24. C L O U D C O M P U T I N G - C O M I N G O F A G E A T R E A T I S E O N R E A L - L I F E U S E C A S E S Scaling databases on the cloud Copyright © 2009, Imaginea Inc. Not to be distributed or communicated without permission. 11/4/2009 24