SlideShare una empresa de Scribd logo
1 de 20
Descargar para leer sin conexión
Comparing Scalable NOSQL Databases
      Functionalities and Measurements




                Dory Thibault

                    UCL

 Contact : thibault.dory@student.uclouvain.be


             Sponsor : Euranova


      Website : nosqlbenchmarking.com




              February 15, 2011
Motivation
                 Overview of the databases
                              Methodology
                                   Results
                  Summary and conclusion

Clari
cations
  As a lot of people who read those slides did not get the oral
  explanations that MUST go with it, here are a few words of
  warning :
       All the databases were used with default con
gurations, I will
       post them soon on nosqlbenchmarking.com
       No index was set manually, doing so could have a big impact
       on performances
       Don't jump too fast on the conclusions, it would be WRONG
       to say that Cassandra is very good and that HBase sucks.
       The Cassandra implementation of MapReduce seems to be
       buggy and do not scale. There must be something wrong with
       my HBase con
guration, HBase is known to run gigantic
       cluster without problems.
                                                                        2 / 20
Motivation
                  Overview of the databases
                               Methodology
                                    Results
                   Summary and conclusion

Clari
cations
  Also keep in mind that a benchmark is always biased by the chosen
  methodology so :
        The way I store data in each database could have an impact
        on the performances
        The summary about the results should not be taken in an
        absolute way, especially the
rst one. When I say Good or
        Bad it is in THIS particular case. Moreover raw results are not
        the most important, scalability is very important too. So good
        performances for Cassandra MapReduce but without
        scalability is NOT good.
        The data set is too small, I'm testing cache performances (but
        it is the same for all of the databases)
  I will add soon a written analysis and a self critic about those
  results on www.nosqlbenchmarking.com
                                                                          3 / 20
Motivation
                    Overview of the databases
                                 Methodology
                                      Results
                     Summary and conclusion

Motivation
  YCSB

  Yahoo! Cloud Servicing Benchmark is the best known noSQL bench-
  marking application so why make another one?


         YCSB uses data generated from statistical distributions
         instead of real data

         YCSB only focuses on read/write/update/scan performances

         YCSB results for elasticity are not conclusive

  Idea

         Data and use case inspired by a concrete case : Wikipedia

         Test read/update performances

         Test MapReduce performances by computing an inverted
         search index
                                                                     4 / 20
Motivation
                                              Cassandra 0.6.10
                  Overview of the databases
                                              HBase 0.20.6
                              Methodology
                                              mongoDB 1.6.5
                                    Results
                                              Riak 0.14
                   Summary and conclusion




Cassandra 0.6.10




  Overview
  Cassandra is a fully distributed column oriented data store that pro-
  vides a MapReduce implementation using Hadoop.


      All the nodes in the cluster play the same role
      The data (existing and new) are sharded automatically among
      the nodes
      The developer can choose the consistency level for each
      request




                                                                          5 / 20
Motivation
                                               Cassandra 0.6.10
                   Overview of the databases
                                               HBase 0.20.6
                               Methodology
                                               mongoDB 1.6.5
                                     Results
                                               Riak 0.14
                    Summary and conclusion




HBase 0.20.6


  Overview
  HBase is a column oriented database that aims to provide low latency
  requests on top of Hadoop HDFS

      An HBase cluster uses several kinds of servers :
             HDFS needs at least one  namenode          datanodes
                                                              and several

             HBase needs a     ZooKeeper cluster master    , a         and several

             regionservers
      The requests must be made to the master(s)
      On the HDFS level, existing data are not sharded
      automatically but new data are
      On the HBase level, the data are divided into regions that are
      sharded automatically across regionservers

                                                                                     6 / 20
Motivation
                                               Cassandra 0.6.10
                   Overview of the databases
                                               HBase 0.20.6
                               Methodology
                                               mongoDB 1.6.5
                                     Results
                                               Riak 0.14
                    Summary and conclusion




mongoDB 1.6.5




  Overview

  mongoDB is a document oriented database that stores JSON dic-
  tionnaries. It provides auto sharding and a MapReduce implemen-
  tation.


       A mongoDB cluster is made of several kinds of servers :
             The shard servers that store data
             The con
guration servers that store the con
guration
             The router servers that receive and route the requests
       Existing and new data are sharded automatically

       MapReduce can only use one thread by server




                                                                      7 / 20
Motivation
                                              Cassandra 0.6.10
                  Overview of the databases
                                              HBase 0.20.6
                              Methodology
                                              mongoDB 1.6.5
                                    Results
                                              Riak 0.14
                   Summary and conclusion




Riak 0.14



  Overview
  Riak is a fully distributed key/bucket store with an implementation
  of MapReduce.


      Buckets can store the data directly or be a link to another
      bucket
      All the nodes in the cluster play the same role
      The data (existing and new) are sharded automatically
      amongs the nodes
      The developer can choose the consistency level for each
      request



                                                                        8 / 20
Motivation
                 Overview of the databases   The data used
                              Methodology    The client
                                   Results   The methodology
                  Summary and conclusion

The data

  Wikipedia export

  20.000 pages downloaded from Wikipedia



       Every document is in XML format

       All documents sum up to 620Mo

       Each document is associated to a single integer ID


  Insertions

  Each document is inserted only once during the whole benchmark




                                                                   9 / 20
Motivation
                Overview of the databases   The data used
                             Methodology    The client
                                  Results   The methodology
                 Summary and conclusion

The client

  Overview
      Fully random requests
      Acts as a perfect load balancer
      The proportion of updates can be speci
ed
      Speci
c parts : read/write/update and MapReduce

  Updates
  The updates simply concatenate the string 1" at the end of the
  article.



                                                                    10 / 20
Motivation
                Overview of the databases   The data used
                             Methodology    The client
                                  Results   The methodology
                 Summary and conclusion

MapReduce
 Overview
 MapReduce is used to build a reverse index for a given keyword.
 The reverse index is a list of pairs made of :
      ID : the ID of the article if Count 6= 0
      Count : the number of occurrences of the keyword in this
      article
 Justi

Más contenido relacionado

La actualidad más candente

HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Simplilearn
 

La actualidad más candente (13)

No SQL introduction
No SQL introductionNo SQL introduction
No SQL introduction
 
Hbase in action - Chapter 09: Deploying HBase
Hbase in action - Chapter 09: Deploying HBaseHbase in action - Chapter 09: Deploying HBase
Hbase in action - Chapter 09: Deploying HBase
 
A Study of Performance NoSQL Databases
A Study of Performance NoSQL DatabasesA Study of Performance NoSQL Databases
A Study of Performance NoSQL Databases
 
Performance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODBPerformance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODB
 
HBase In Action - Chapter 10 - Operations
HBase In Action - Chapter 10 - OperationsHBase In Action - Chapter 10 - Operations
HBase In Action - Chapter 10 - Operations
 
1. introduction to no sql
1. introduction to no sql1. introduction to no sql
1. introduction to no sql
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
No sql database
No sql databaseNo sql database
No sql database
 
paper
paperpaper
paper
 
Sql vs NoSQL-Presentation
 Sql vs NoSQL-Presentation Sql vs NoSQL-Presentation
Sql vs NoSQL-Presentation
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Building highly scalable data pipelines with Apache Spark
Building highly scalable data pipelines with Apache SparkBuilding highly scalable data pipelines with Apache Spark
Building highly scalable data pipelines with Apache Spark
 
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
 

Similar a Comparing noSQL databases : benchmark

MongoDB Lab Manual (1).pdf used in data science
MongoDB Lab Manual (1).pdf used in data scienceMongoDB Lab Manual (1).pdf used in data science
MongoDB Lab Manual (1).pdf used in data science
bitragowthamkumar1
 
Altoros using no sql databases for interactive_applications
Altoros using no sql databases for interactive_applicationsAltoros using no sql databases for interactive_applications
Altoros using no sql databases for interactive_applications
Jeff Harris
 

Similar a Comparing noSQL databases : benchmark (20)

DSM - Comparison of Hbase and Cassandra
DSM - Comparison of Hbase and CassandraDSM - Comparison of Hbase and Cassandra
DSM - Comparison of Hbase and Cassandra
 
Data Storage and Management project Report
Data Storage and Management project ReportData Storage and Management project Report
Data Storage and Management project Report
 
Real World NoSQL (by Chris Yuen)
Real World NoSQL (by Chris Yuen)Real World NoSQL (by Chris Yuen)
Real World NoSQL (by Chris Yuen)
 
Benchmarking Couchbase Server for Interactive Applications
Benchmarking Couchbase Server for Interactive ApplicationsBenchmarking Couchbase Server for Interactive Applications
Benchmarking Couchbase Server for Interactive Applications
 
Hbase
HbaseHbase
Hbase
 
C1803041317
C1803041317C1803041317
C1803041317
 
Oracle NoSQL Database Compared to Cassandra and HBase
Oracle NoSQL Database Compared to Cassandra and HBaseOracle NoSQL Database Compared to Cassandra and HBase
Oracle NoSQL Database Compared to Cassandra and HBase
 
Dsm project-h base-cassandra
Dsm project-h base-cassandraDsm project-h base-cassandra
Dsm project-h base-cassandra
 
Integrating dbm ss as a read only execution layer into hadoop
Integrating dbm ss as a read only execution layer into hadoopIntegrating dbm ss as a read only execution layer into hadoop
Integrating dbm ss as a read only execution layer into hadoop
 
Performance Comparison of HBase and Cassandra
Performance Comparison of HBase and CassandraPerformance Comparison of HBase and Cassandra
Performance Comparison of HBase and Cassandra
 
No sql
No sqlNo sql
No sql
 
MongoDB Lab Manual (1).pdf used in data science
MongoDB Lab Manual (1).pdf used in data scienceMongoDB Lab Manual (1).pdf used in data science
MongoDB Lab Manual (1).pdf used in data science
 
NoSQL Basics and MongDB
NoSQL Basics and  MongDBNoSQL Basics and  MongDB
NoSQL Basics and MongDB
 
Trends in Computer Science and Information Technology
Trends in Computer Science and Information TechnologyTrends in Computer Science and Information Technology
Trends in Computer Science and Information Technology
 
Altoros using no sql databases for interactive_applications
Altoros using no sql databases for interactive_applicationsAltoros using no sql databases for interactive_applications
Altoros using no sql databases for interactive_applications
 
No sqlpresentation
No sqlpresentationNo sqlpresentation
No sqlpresentation
 
HadoopDB in Action
HadoopDB in ActionHadoopDB in Action
HadoopDB in Action
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Comparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsbComparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsb
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Comparing noSQL databases : benchmark

  • 1. Comparing Scalable NOSQL Databases Functionalities and Measurements Dory Thibault UCL Contact : thibault.dory@student.uclouvain.be Sponsor : Euranova Website : nosqlbenchmarking.com February 15, 2011
  • 2. Motivation Overview of the databases Methodology Results Summary and conclusion Clari
  • 3. cations As a lot of people who read those slides did not get the oral explanations that MUST go with it, here are a few words of warning : All the databases were used with default con
  • 4. gurations, I will post them soon on nosqlbenchmarking.com No index was set manually, doing so could have a big impact on performances Don't jump too fast on the conclusions, it would be WRONG to say that Cassandra is very good and that HBase sucks. The Cassandra implementation of MapReduce seems to be buggy and do not scale. There must be something wrong with my HBase con
  • 5. guration, HBase is known to run gigantic cluster without problems. 2 / 20
  • 6. Motivation Overview of the databases Methodology Results Summary and conclusion Clari
  • 7. cations Also keep in mind that a benchmark is always biased by the chosen methodology so : The way I store data in each database could have an impact on the performances The summary about the results should not be taken in an absolute way, especially the
  • 8. rst one. When I say Good or Bad it is in THIS particular case. Moreover raw results are not the most important, scalability is very important too. So good performances for Cassandra MapReduce but without scalability is NOT good. The data set is too small, I'm testing cache performances (but it is the same for all of the databases) I will add soon a written analysis and a self critic about those results on www.nosqlbenchmarking.com 3 / 20
  • 9. Motivation Overview of the databases Methodology Results Summary and conclusion Motivation YCSB Yahoo! Cloud Servicing Benchmark is the best known noSQL bench- marking application so why make another one? YCSB uses data generated from statistical distributions instead of real data YCSB only focuses on read/write/update/scan performances YCSB results for elasticity are not conclusive Idea Data and use case inspired by a concrete case : Wikipedia Test read/update performances Test MapReduce performances by computing an inverted search index 4 / 20
  • 10. Motivation Cassandra 0.6.10 Overview of the databases HBase 0.20.6 Methodology mongoDB 1.6.5 Results Riak 0.14 Summary and conclusion Cassandra 0.6.10 Overview Cassandra is a fully distributed column oriented data store that pro- vides a MapReduce implementation using Hadoop. All the nodes in the cluster play the same role The data (existing and new) are sharded automatically among the nodes The developer can choose the consistency level for each request 5 / 20
  • 11. Motivation Cassandra 0.6.10 Overview of the databases HBase 0.20.6 Methodology mongoDB 1.6.5 Results Riak 0.14 Summary and conclusion HBase 0.20.6 Overview HBase is a column oriented database that aims to provide low latency requests on top of Hadoop HDFS An HBase cluster uses several kinds of servers : HDFS needs at least one namenode datanodes and several HBase needs a ZooKeeper cluster master , a and several regionservers The requests must be made to the master(s) On the HDFS level, existing data are not sharded automatically but new data are On the HBase level, the data are divided into regions that are sharded automatically across regionservers 6 / 20
  • 12. Motivation Cassandra 0.6.10 Overview of the databases HBase 0.20.6 Methodology mongoDB 1.6.5 Results Riak 0.14 Summary and conclusion mongoDB 1.6.5 Overview mongoDB is a document oriented database that stores JSON dic- tionnaries. It provides auto sharding and a MapReduce implemen- tation. A mongoDB cluster is made of several kinds of servers : The shard servers that store data The con
  • 13. guration servers that store the con
  • 14. guration The router servers that receive and route the requests Existing and new data are sharded automatically MapReduce can only use one thread by server 7 / 20
  • 15. Motivation Cassandra 0.6.10 Overview of the databases HBase 0.20.6 Methodology mongoDB 1.6.5 Results Riak 0.14 Summary and conclusion Riak 0.14 Overview Riak is a fully distributed key/bucket store with an implementation of MapReduce. Buckets can store the data directly or be a link to another bucket All the nodes in the cluster play the same role The data (existing and new) are sharded automatically amongs the nodes The developer can choose the consistency level for each request 8 / 20
  • 16. Motivation Overview of the databases The data used Methodology The client Results The methodology Summary and conclusion The data Wikipedia export 20.000 pages downloaded from Wikipedia Every document is in XML format All documents sum up to 620Mo Each document is associated to a single integer ID Insertions Each document is inserted only once during the whole benchmark 9 / 20
  • 17. Motivation Overview of the databases The data used Methodology The client Results The methodology Summary and conclusion The client Overview Fully random requests Acts as a perfect load balancer The proportion of updates can be speci
  • 18. ed Speci
  • 19. c parts : read/write/update and MapReduce Updates The updates simply concatenate the string 1" at the end of the article. 10 / 20
  • 20. Motivation Overview of the databases The data used Methodology The client Results The methodology Summary and conclusion MapReduce Overview MapReduce is used to build a reverse index for a given keyword. The reverse index is a list of pairs made of : ID : the ID of the article if Count 6= 0 Count : the number of occurrences of the keyword in this article Justi
  • 21. cation This kind of computation implies that all the documents are crawled and take advantage of the speci
  • 23. Motivation Overview of the databases The data used Methodology The client Results The methodology Summary and conclusion The methodology 1 Start up a clean cluster of size 3 and insert all the documents 2 Choose a total number of requests, a read percentage and starts the benchmark 3 Wait one minute and starts the benchmark again 4 Wait
  • 24. ve minutes and starts the benchmark again 5 Start the MapReduce benchmark 6 Add a new node to the cluster and wait for it to be ready then restart immediately the bench with the new node's IP in the list 7 Jump to 3 until there are no more computer to add to the cluster 12 / 20
  • 25. Motivation Overview of the databases Methodology Results Summary and conclusion Read/update results 13 / 20
  • 26. Motivation Overview of the databases Methodology Results Summary and conclusion Read/update results without HBase 14 / 20
  • 27. Motivation Overview of the databases Methodology Results Summary and conclusion MapReduce performance 15 / 20
  • 28. Motivation Overview of the databases Methodology Results Summary and conclusion The HBase case Veri
  • 29. cations made : Checked the logs : nothing seemed problematic HDFS level : running the balancer with a very low threshold distributed the blocks evenly but without any impact on the performances HBase level : the regions where always nearly evenly distributed across the regionservers The number of rows did not change and the content of each row was correct 16 / 20
  • 30. Motivation Overview of the databases Methodology Results Summary and conclusion Summary of raw performances DB read/update performances MapReduce performances Cassandra Good Very Good HBase Bad / N.A. Average / N.A mongoDB Good Poor but scalable Riak Poor / unstable Average but scalable 17 / 20
  • 31. Motivation Overview of the databases Methodology Results Summary and conclusion Summary of scalability Going from 3 to 8 servers is a 266% increase in capacity, here are the observed increases in performances : DB read/update MapReduce Cassandra 153% 112% HBase 11% 43% mongoDB 145% 211% Riak 74% 189% Riak 7 nodes max 155% 168% 18 / 20
  • 32. Motivation Overview of the databases Methodology Results Summary and conclusion Conclusion and future work Conclusion The elastic gain seems more apparent than with YCSB but not linear either It is worth testing MapReduce performances as the results vary a lot between databases for both raw and scalability performances Future work This is still a work in progress : Applying this benchmark to other databases (Terrastore, Voldemort, Scalaris ...) Trying with a growing/bigger data set 19 / 20
  • 33. Motivation Overview of the databases Methodology Results Summary and conclusion Questions and remarks Any questions or remarks? 20 / 20