SlideShare una empresa de Scribd logo
1 de 41
Descargar para leer sin conexión
Level 300




     Haytham ElFadeel
     Researcher in Computer Sciences
Agenda
• Introduction
  –   Glance at the Scalable systems.
  –   What the available storage solution.
  –   The problem with the current solutions.
  –   The problem with the Database
• The Next-Generation of Storage System
  – Key-Value store systems.
  – Performance comparison.
• How it’s works
• Discussions, Q/A
Glance at Scalable Systems

• Scalable systems
  – Scalability is the ability to provide better
    performance when you add more computing
    power.
  – This performance gained should be relevant to the
    added computing power.
  – Examples: Google, Yahoo, Facebook, Amazon, eBay,
    Orkut, Google App Engine, etc.
Glance at Scalable Systems

• Scalable types
  – Vertical Scalability: Adding resource within the
    same logical unit to increase the capacity. For
    example: Add more CPUs, or expanding the storage
    or the memory.
  – Horizontal Scalability: Add multiple logical units of
    resources and make them together work as a single
    unit. You can think about it like: Clustering,
    Distributed, and Load-Balancing.
Vertical Scalability vs. Horizontal
Scalability

      Vertical Scaling   Horizontal Scaling

            Limited             Not limited




         Hardware only     Software and Hardware
Vertical Scalability vs. Horizontal
Scalability
Haytham ElFadeel Quote:
 If you need scalability, urgently, going to
 vertical scaling is probably will to be the
 easiest, but be sure that Vertical scaling, gets
 more and more expensive as you grow, and
 While infinite horizontal linear scalability is
 difficult to achieve, infinite vertical scalability
 is impossible.
Vertical Scalability vs. Horizontal
Scalability
Haytham ElFadeel Quote:
 On the other hand Horizontal scalability
 doesn’t require you to buy more and more
 expensive hardware. It’s meant to be scaled
 using commodity storage and server solutions.
 But Horizontal scalability isn’t cheap either.
 The application has to be built ground up to
 run on multiple servers as a single application.
Glance at Scalable Systems
• Facebook
  – More than 200,000,000 active user.
  – 50,000 photo uploaded per minute.
  – The most active social-network in the Web.
• Facebook chat
  – The main challenge is maintain the users status.
  – Distribute the load should depend on the users,
    and they friends to avoid the traveling.
  – Building a system that should scale from that start
    to serve 100,000,000 user is really hard.
Glance at Scalable Systems

• Amazon
  – More than 10,000,000 transition in every holidays.
  – The Reliability of the user shopping cart is not
    option.
• Google, Yahoo, Microsoft, Kngine, etc
  – Processing huge amount of data, more than 1TB.
  – Sorting the index by the rank value. Which means,
    sort more than 1TB of data.
  – Save the Crawled Web pages.
The Available Storage Solutions

• Memory:
  – Just a Data Structure :)
• Disk:
  – Text File: { XML, Protocol Buffer, Json }
  – Binary File: { Serialized, custom format }
  – Database: { MySQL, SQL Server, SQLLite, Oracle }
The Available Storage Solutions

• Memory:              What about capacity

  – Just a Data Structure :)
                                                    Bad performance
• Disk:
  – Text File: { XML, Protocol Buffer, Json }
                                              Not portable, questions
  – Binary File: { Serialized, custom format } about performance
  – Database: { MySQL, SQL Server, SQLLite, Oracle }

                                 Bad performance,
                               Complex, huge latency.
The Problem with the Database

• Causes
  –   Old and Very complex system.
  –   Many wasted features.
  –   Many steps to process the SQL query.
  –   Need administration, and others.
The Problem with the Database

  • Causes
      – Old and Very complex system.
          • The RDMS is very complex system, just like Operating
            System:
              – Thread Scheduling, Deadlock monitor, Resource manager.
              – I/O Manager, Pages Manager, Execution Plan Manager.
              – Case Manager, Memory Manager, Transaction Manager, etc.
          • Most of DBMS architecture, designs, algorithms came up
            around 1970s:
              – Different hardware, platform properties.
              – Old architecture, design, and algorithms.

Please review resource #1
The Problem with the Database

• Causes
  – Many wasted features.
    • Today systems have very rich features, simply because
      they think that ‘one size fits all’:
       – CLR Types, CLR Integration, Replication, Functions.
       – Policy, Relations, Transaction, Stored procedure, ACID, etc.
    • You can even call a Web Service from SQL Server! All this
      mess, make the database appear like a platform and
      development environment.
The problem with the Database

  • Causes
      – Many Steps to process the query.
          • Parse the Query.
          • Build the expression tree, and resolve the relational
            algebra expression.
          • Optimize the expression tree.
          • Choice the execution plan.
          • Start execute.



Please review resource #2, #3
The problem with the Database

• Effects
  – Bad Performance: Throughput, Resource usage,
    Latency.
  – Not Scalable.
The problem with the Database

• Effects
   – Bad Performance: Throughput, Resource usage,
     Latency:
      • Even the faster DBMS ‘MySQL’ can’t provide more than
        5,000 query per second*.
      • Add to this the consumed resource, and the big latency.




* Depend on the configuration
The problem with the Database

  • Effects
      – Not Scale:
          • The Database is not designed to scale.
          • Even if you get a new PC and partition the Database you
            will never get (accepted) good performance
            improvement.




Please review resource #1
The problem with the Database

    The Database give us ACID:
•   Atomicity: A transaction is all or nothing.
•   Consistency: Only valid data is written to the
    database.
•   Isolation: pretend all transactions are
    happening serially and the data is correct.
•   Durability: What you write is what you get.
The problem with the Database

 The problem with ACID is that it gives you too
 much, it trips you up when you are trying to
 scale a system across multiple nodes.
 Down time is unacceptable. So your system
 needs to be reliable. Reliability requires
 multiple nodes to handle machine failures.
 To make a scalable systems that can handle lots
 and lots of reads and writes you need many
 more nodes.
The problem with the Database

 Once you try to scale ACID across many
 machines you hit problems with network
 failures and delays. The algorithms don't work
 in a distributed environment at any acceptable
 speed.


             It ’s a dead end
The Next generation of Storage
Systems
 From long time ago many researches teams
 and companies discovered that the database is
 main bottleneck.
 Many wasted features, bad performance, and
 not designed for scale systems.
The Next generation of Storage
  Systems
     Building large systems on top of a traditional
     RDBMS data storage layer is no longer good
     enough.
     This talk explores the landscape of new
     technologies available today to augment your
     data layer to improve performance and
     reliability.


Please review resource #4
Key-Value Storage Systems

• Simple data-model, just key-value pairs.
• Every Value Assigned to Key.
• No complex stuff, such as: Relations, ACID, or
  SQL quires.
• Simple interface:
  – Get(key)
  – Put(key, value)
  – Delete(key) < Optional
Key-Value Storage Systems

• Designed from the start to scale to hundreds of
  machines.
• Designed to be reliable, even if 50% of the
  machines crashed.
• No extra work require to add new machine,
  just plug the machine and it will work in
  harmony.
• Many open source projects (C++, Java, Lisp).
Key-Value Storage Systems

• Who use such systems:
  –   Facebook.
  –   Google Orkut, Analysis.
  –   Google Web Crawling.
  –   Amazon.
  –   Powerset.
  –   eBay.
  –   Kngine.                   – General using.
  –   Yahoo.                    – Storing, and huge data analysis.
                                – Transactions, and huge data analysis.
Key-Value Storage Systems

 You may wonder, can we really live without
 Relations, ACID ?!
   – The short answer: Absolutely Yes.
   – The long answer: Absolutely Yes, But nothing for
     free.
Key-Value Storage Systems

                        Now
             You should make your decide


                                   Or, Take the red pill
  Take the blue pill
                                        And stay in
  And see the truth
                                       wonderland
Key-Value Storage Systems

 Key-Value Storage System, and other systems
 built around CAP concept:
 Consistency: your data is correct all the time.
 What you write is what you read.
 Availability: you can read and write and write
 your data all the time.
 Partition Tolerance: if one or more nodes fails
 the system still works and becomes consistent
 when the system comes on-line.
Key-Value Storage Systems
  One Node - Performance Comparison (Web)
  • MySql
      – 3,030 sets/second.
      – 4,670 gets/second.
  • Redis
      – 11,200 sets/second. (3.7x MySQL)
      – 9,840 gets/second. (2.1x MySQL)
  • Tokyo Tyrant
      – 9,030 sets/second. (3.0x MySQL)
      – 9,250 gets/second. (2.0x MySQL)
Please review resource #5
Key-Value Storage Systems

Two High-End Nodes - Performance Comparison
(Web)
• Redis
  – 89,230 sets/second.
  – 85,840 gets/second.
Key-Value Storage Systems

One Node - Performance Comparison
• SQL Server
   – 2,900 sets/second.
   – 3,500 gets/second.
• Vina*
   – 10,100 sets/second. (3.4x SQL Server)
   – 9,970 gets/second. (2.8x SQL Server)

* Vina : Key-Value Storage System used inside Kngine.
How it’s Works

Any Key-Value storage system, consist of two
primary layers:
  – Aggregation Layer
  – Storing Layer
How it’s Works

Any Key-Value storage system, consist of two
primary layers:
  – Aggregation Layer
     • Manage the instances, replication and distribution.
  – Storing Layer
     • One or many Disk-based Hash-Table.
How it’s Works (Storing Layer)




           On the board
How it’s Works (Aggregation Layer)

  • Received the requests.
  • Route it to the target node.
  • Manage Partitioning, and Replicas.
  • The Partitioning, Replication done by
  Consistence Hashing algorithm.


                            On the board

Please review resource #6
Key-Value Storage Systems

 • Amazon Dynamo.          < Paper
 • Facebook Cassandra.     < Open source
 • Tokyo Cabinet/Tyrant.   < Open source
 • Redis                   < Open source
 • MongoDB                 < Open source
Q/A
References
 1. The End of an Architectural Era (It’s Time for a Complete
    Rewrite). Paper.
 2. Database Systems - Paul Beynon-Davies. Book.
 3. Inside SQL Server engine - MS Press. Book.
 4. Drop ACID and Think About Data. Highscalability.com.
 5. Redis vs MySQL vs Tokyo Tyrant. Colin Howe’s Blog.
 6. Consistent Hashing and Random Trees: Distributed
    Caching Protocols for Relieving Hot Spots on the World
    Wide Web. Paper.
 7. Dynamo: Amazon’s Highly Available Key-value Store.
    Paper.
 8. Redis, Tokyo Tyrant project.
 9. Consistent Hashing. Tom white Blog.
Resources

 1. High Scalability blog.
    Highscalability.com
 1. It’s all about innovation blog.
    Hfadeel.com/blog.
 2. All Things Distributed.
    Allthingsdistributed.com
 3. Tom White blog
    lexemetech.com
Thanks…
Dear all,
  All of my presentation content it's open source.
  Please feel free to use, copy, and re-distribute it.

Más contenido relacionado

La actualidad más candente

C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?DataStax
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databasesguestdfd1ec
 
To SQL or NoSQL, that is the question
To SQL or NoSQL, that is the questionTo SQL or NoSQL, that is the question
To SQL or NoSQL, that is the questionKrishnakumar S
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
 
Polyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great TogetherPolyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great TogetherJohn Wood
 
The Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot PersistenceThe Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot PersistenceAbdelmonaim Remani
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin
 
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsWhat Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsTodd Hoff
 
Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesMaynooth University
 
How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scale
How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scaleHow LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scale
How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scaleLinkedIn
 
Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?
Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?
Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?Clustrix
 
NoSQL in Real-time Architectures
NoSQL in Real-time ArchitecturesNoSQL in Real-time Architectures
NoSQL in Real-time ArchitecturesRonen Botzer
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DBHeriyadi Janwar
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
Service Primitives for Internet Scale Applications
Service Primitives for Internet Scale ApplicationsService Primitives for Internet Scale Applications
Service Primitives for Internet Scale ApplicationsAmr Awadallah
 
Scalable Web Architectures: Common Patterns and Approaches
Scalable Web Architectures: Common Patterns and ApproachesScalable Web Architectures: Common Patterns and Approaches
Scalable Web Architectures: Common Patterns and Approachesadunne
 
Design principles of scalable, distributed systems
Design principles of scalable, distributed systemsDesign principles of scalable, distributed systems
Design principles of scalable, distributed systemsTinniam V Ganesh (TV)
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 

La actualidad más candente (20)

C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?
 
RDBMS vs NoSQL
RDBMS vs NoSQLRDBMS vs NoSQL
RDBMS vs NoSQL
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 
To SQL or NoSQL, that is the question
To SQL or NoSQL, that is the questionTo SQL or NoSQL, that is the question
To SQL or NoSQL, that is the question
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
 
Polyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great TogetherPolyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great Together
 
The Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot PersistenceThe Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot Persistence
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsWhat Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
 
Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choices
 
How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scale
How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scaleHow LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scale
How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scale
 
Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?
Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?
Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?
 
NoSQL in Real-time Architectures
NoSQL in Real-time ArchitecturesNoSQL in Real-time Architectures
NoSQL in Real-time Architectures
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Service Primitives for Internet Scale Applications
Service Primitives for Internet Scale ApplicationsService Primitives for Internet Scale Applications
Service Primitives for Internet Scale Applications
 
Scalable Web Architectures: Common Patterns and Approaches
Scalable Web Architectures: Common Patterns and ApproachesScalable Web Architectures: Common Patterns and Approaches
Scalable Web Architectures: Common Patterns and Approaches
 
Design principles of scalable, distributed systems
Design principles of scalable, distributed systemsDesign principles of scalable, distributed systems
Design principles of scalable, distributed systems
 
No sql3 rmoug
No sql3 rmougNo sql3 rmoug
No sql3 rmoug
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 

Similar a Storage Systems For Scalable systems

Storage Systems for High Scalable Systems Presentation
Storage Systems for High Scalable Systems PresentationStorage Systems for High Scalable Systems Presentation
Storage Systems for High Scalable Systems Presentationandyman3000
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabasesAdi Challa
 
Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]shuwutong
 
Handling Massive Writes
Handling Massive WritesHandling Massive Writes
Handling Massive WritesLiran Zelkha
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game ChangerCaserta
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!Andraz Tori
 
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLPerformance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLTriNimbus
 
TECHunplugged Austin 2016
TECHunplugged Austin 2016TECHunplugged Austin 2016
TECHunplugged Austin 2016Chris Evans
 
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Amazon Web Services
 
071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephenSteve Feldman
 
Tableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
Tableau on Hadoop Meet Up: Advancing from Extracts to Live ConnectTableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
Tableau on Hadoop Meet Up: Advancing from Extracts to Live ConnectRemy Rosenbaum
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7abdulrahmanhelan
 
Designing, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDesigning, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDenny Lee
 
AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...
AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...
AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...Amazon Web Services
 
MySQL Optimization from a Developer's point of view
MySQL Optimization from a Developer's point of viewMySQL Optimization from a Developer's point of view
MySQL Optimization from a Developer's point of viewSachin Khosla
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople
 
high performance databases
high performance databaseshigh performance databases
high performance databasesmahdi_92
 

Similar a Storage Systems For Scalable systems (20)

Storage Systems for High Scalable Systems Presentation
Storage Systems for High Scalable Systems PresentationStorage Systems for High Scalable Systems Presentation
Storage Systems for High Scalable Systems Presentation
 
Master.pptx
Master.pptxMaster.pptx
Master.pptx
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]
 
Handling Massive Writes
Handling Massive WritesHandling Massive Writes
Handling Massive Writes
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLPerformance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
 
TECHunplugged Austin 2016
TECHunplugged Austin 2016TECHunplugged Austin 2016
TECHunplugged Austin 2016
 
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
 
071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
Tableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
Tableau on Hadoop Meet Up: Advancing from Extracts to Live ConnectTableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
Tableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
Designing, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDesigning, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons Learned
 
AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...
AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...
AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...
 
MySQL Optimization from a Developer's point of view
MySQL Optimization from a Developer's point of viewMySQL Optimization from a Developer's point of view
MySQL Optimization from a Developer's point of view
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
high performance databases
high performance databaseshigh performance databases
high performance databases
 

Más de elliando dias

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slideselliando dias
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScriptelliando dias
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structureselliando dias
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de containerelliando dias
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agilityelliando dias
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Librarieselliando dias
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!elliando dias
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Webelliando dias
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduinoelliando dias
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorceryelliando dias
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Designelliando dias
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makeselliando dias
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.elliando dias
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebookelliando dias
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Studyelliando dias
 

Más de elliando dias (20)

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slides
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScript
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structures
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de container
 
Geometria Projetiva
Geometria ProjetivaGeometria Projetiva
Geometria Projetiva
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agility
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Libraries
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!
 
Ragel talk
Ragel talkRagel talk
Ragel talk
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Web
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduino
 
Minicurso arduino
Minicurso arduinoMinicurso arduino
Minicurso arduino
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorcery
 
Rango
RangoRango
Rango
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Design
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makes
 
Hadoop + Clojure
Hadoop + ClojureHadoop + Clojure
Hadoop + Clojure
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Study
 

Último

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Último (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Storage Systems For Scalable systems

  • 1. Level 300 Haytham ElFadeel Researcher in Computer Sciences
  • 2. Agenda • Introduction – Glance at the Scalable systems. – What the available storage solution. – The problem with the current solutions. – The problem with the Database • The Next-Generation of Storage System – Key-Value store systems. – Performance comparison. • How it’s works • Discussions, Q/A
  • 3. Glance at Scalable Systems • Scalable systems – Scalability is the ability to provide better performance when you add more computing power. – This performance gained should be relevant to the added computing power. – Examples: Google, Yahoo, Facebook, Amazon, eBay, Orkut, Google App Engine, etc.
  • 4. Glance at Scalable Systems • Scalable types – Vertical Scalability: Adding resource within the same logical unit to increase the capacity. For example: Add more CPUs, or expanding the storage or the memory. – Horizontal Scalability: Add multiple logical units of resources and make them together work as a single unit. You can think about it like: Clustering, Distributed, and Load-Balancing.
  • 5. Vertical Scalability vs. Horizontal Scalability Vertical Scaling Horizontal Scaling Limited Not limited Hardware only Software and Hardware
  • 6. Vertical Scalability vs. Horizontal Scalability Haytham ElFadeel Quote: If you need scalability, urgently, going to vertical scaling is probably will to be the easiest, but be sure that Vertical scaling, gets more and more expensive as you grow, and While infinite horizontal linear scalability is difficult to achieve, infinite vertical scalability is impossible.
  • 7. Vertical Scalability vs. Horizontal Scalability Haytham ElFadeel Quote: On the other hand Horizontal scalability doesn’t require you to buy more and more expensive hardware. It’s meant to be scaled using commodity storage and server solutions. But Horizontal scalability isn’t cheap either. The application has to be built ground up to run on multiple servers as a single application.
  • 8. Glance at Scalable Systems • Facebook – More than 200,000,000 active user. – 50,000 photo uploaded per minute. – The most active social-network in the Web. • Facebook chat – The main challenge is maintain the users status. – Distribute the load should depend on the users, and they friends to avoid the traveling. – Building a system that should scale from that start to serve 100,000,000 user is really hard.
  • 9. Glance at Scalable Systems • Amazon – More than 10,000,000 transition in every holidays. – The Reliability of the user shopping cart is not option. • Google, Yahoo, Microsoft, Kngine, etc – Processing huge amount of data, more than 1TB. – Sorting the index by the rank value. Which means, sort more than 1TB of data. – Save the Crawled Web pages.
  • 10. The Available Storage Solutions • Memory: – Just a Data Structure :) • Disk: – Text File: { XML, Protocol Buffer, Json } – Binary File: { Serialized, custom format } – Database: { MySQL, SQL Server, SQLLite, Oracle }
  • 11. The Available Storage Solutions • Memory: What about capacity – Just a Data Structure :) Bad performance • Disk: – Text File: { XML, Protocol Buffer, Json } Not portable, questions – Binary File: { Serialized, custom format } about performance – Database: { MySQL, SQL Server, SQLLite, Oracle } Bad performance, Complex, huge latency.
  • 12. The Problem with the Database • Causes – Old and Very complex system. – Many wasted features. – Many steps to process the SQL query. – Need administration, and others.
  • 13. The Problem with the Database • Causes – Old and Very complex system. • The RDMS is very complex system, just like Operating System: – Thread Scheduling, Deadlock monitor, Resource manager. – I/O Manager, Pages Manager, Execution Plan Manager. – Case Manager, Memory Manager, Transaction Manager, etc. • Most of DBMS architecture, designs, algorithms came up around 1970s: – Different hardware, platform properties. – Old architecture, design, and algorithms. Please review resource #1
  • 14. The Problem with the Database • Causes – Many wasted features. • Today systems have very rich features, simply because they think that ‘one size fits all’: – CLR Types, CLR Integration, Replication, Functions. – Policy, Relations, Transaction, Stored procedure, ACID, etc. • You can even call a Web Service from SQL Server! All this mess, make the database appear like a platform and development environment.
  • 15. The problem with the Database • Causes – Many Steps to process the query. • Parse the Query. • Build the expression tree, and resolve the relational algebra expression. • Optimize the expression tree. • Choice the execution plan. • Start execute. Please review resource #2, #3
  • 16. The problem with the Database • Effects – Bad Performance: Throughput, Resource usage, Latency. – Not Scalable.
  • 17. The problem with the Database • Effects – Bad Performance: Throughput, Resource usage, Latency: • Even the faster DBMS ‘MySQL’ can’t provide more than 5,000 query per second*. • Add to this the consumed resource, and the big latency. * Depend on the configuration
  • 18. The problem with the Database • Effects – Not Scale: • The Database is not designed to scale. • Even if you get a new PC and partition the Database you will never get (accepted) good performance improvement. Please review resource #1
  • 19. The problem with the Database The Database give us ACID: • Atomicity: A transaction is all or nothing. • Consistency: Only valid data is written to the database. • Isolation: pretend all transactions are happening serially and the data is correct. • Durability: What you write is what you get.
  • 20. The problem with the Database The problem with ACID is that it gives you too much, it trips you up when you are trying to scale a system across multiple nodes. Down time is unacceptable. So your system needs to be reliable. Reliability requires multiple nodes to handle machine failures. To make a scalable systems that can handle lots and lots of reads and writes you need many more nodes.
  • 21. The problem with the Database Once you try to scale ACID across many machines you hit problems with network failures and delays. The algorithms don't work in a distributed environment at any acceptable speed. It ’s a dead end
  • 22. The Next generation of Storage Systems From long time ago many researches teams and companies discovered that the database is main bottleneck. Many wasted features, bad performance, and not designed for scale systems.
  • 23. The Next generation of Storage Systems Building large systems on top of a traditional RDBMS data storage layer is no longer good enough. This talk explores the landscape of new technologies available today to augment your data layer to improve performance and reliability. Please review resource #4
  • 24. Key-Value Storage Systems • Simple data-model, just key-value pairs. • Every Value Assigned to Key. • No complex stuff, such as: Relations, ACID, or SQL quires. • Simple interface: – Get(key) – Put(key, value) – Delete(key) < Optional
  • 25. Key-Value Storage Systems • Designed from the start to scale to hundreds of machines. • Designed to be reliable, even if 50% of the machines crashed. • No extra work require to add new machine, just plug the machine and it will work in harmony. • Many open source projects (C++, Java, Lisp).
  • 26. Key-Value Storage Systems • Who use such systems: – Facebook. – Google Orkut, Analysis. – Google Web Crawling. – Amazon. – Powerset. – eBay. – Kngine. – General using. – Yahoo. – Storing, and huge data analysis. – Transactions, and huge data analysis.
  • 27. Key-Value Storage Systems You may wonder, can we really live without Relations, ACID ?! – The short answer: Absolutely Yes. – The long answer: Absolutely Yes, But nothing for free.
  • 28. Key-Value Storage Systems Now You should make your decide Or, Take the red pill Take the blue pill And stay in And see the truth wonderland
  • 29. Key-Value Storage Systems Key-Value Storage System, and other systems built around CAP concept: Consistency: your data is correct all the time. What you write is what you read. Availability: you can read and write and write your data all the time. Partition Tolerance: if one or more nodes fails the system still works and becomes consistent when the system comes on-line.
  • 30. Key-Value Storage Systems One Node - Performance Comparison (Web) • MySql – 3,030 sets/second. – 4,670 gets/second. • Redis – 11,200 sets/second. (3.7x MySQL) – 9,840 gets/second. (2.1x MySQL) • Tokyo Tyrant – 9,030 sets/second. (3.0x MySQL) – 9,250 gets/second. (2.0x MySQL) Please review resource #5
  • 31. Key-Value Storage Systems Two High-End Nodes - Performance Comparison (Web) • Redis – 89,230 sets/second. – 85,840 gets/second.
  • 32. Key-Value Storage Systems One Node - Performance Comparison • SQL Server – 2,900 sets/second. – 3,500 gets/second. • Vina* – 10,100 sets/second. (3.4x SQL Server) – 9,970 gets/second. (2.8x SQL Server) * Vina : Key-Value Storage System used inside Kngine.
  • 33. How it’s Works Any Key-Value storage system, consist of two primary layers: – Aggregation Layer – Storing Layer
  • 34. How it’s Works Any Key-Value storage system, consist of two primary layers: – Aggregation Layer • Manage the instances, replication and distribution. – Storing Layer • One or many Disk-based Hash-Table.
  • 35. How it’s Works (Storing Layer) On the board
  • 36. How it’s Works (Aggregation Layer) • Received the requests. • Route it to the target node. • Manage Partitioning, and Replicas. • The Partitioning, Replication done by Consistence Hashing algorithm. On the board Please review resource #6
  • 37. Key-Value Storage Systems • Amazon Dynamo. < Paper • Facebook Cassandra. < Open source • Tokyo Cabinet/Tyrant. < Open source • Redis < Open source • MongoDB < Open source
  • 38. Q/A
  • 39. References 1. The End of an Architectural Era (It’s Time for a Complete Rewrite). Paper. 2. Database Systems - Paul Beynon-Davies. Book. 3. Inside SQL Server engine - MS Press. Book. 4. Drop ACID and Think About Data. Highscalability.com. 5. Redis vs MySQL vs Tokyo Tyrant. Colin Howe’s Blog. 6. Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web. Paper. 7. Dynamo: Amazon’s Highly Available Key-value Store. Paper. 8. Redis, Tokyo Tyrant project. 9. Consistent Hashing. Tom white Blog.
  • 40. Resources 1. High Scalability blog. Highscalability.com 1. It’s all about innovation blog. Hfadeel.com/blog. 2. All Things Distributed. Allthingsdistributed.com 3. Tom White blog lexemetech.com
  • 41. Thanks… Dear all, All of my presentation content it's open source. Please feel free to use, copy, and re-distribute it.