SlideShare una empresa de Scribd logo
1 de 23
Descargar para leer sin conexión
NoSQL Findings
                                 Christian van der Leeden




Thursday, September 23, 2010
Our problem
                    • Growth is not linear and not predictable

                          • e.g. History::Session table now > 30 Mio entries

                          • Activities > 26 Mio entries

                    • Postgres will be the performance bottleneck




Thursday, September 23, 2010
Criteria
                    • Allow us to scale from 100k Daily Active Users (DAU)
                      to 1 Mio DAU up to 10Mio DAU

                    • Scale horizontally (“Just add servers”)

                    • Good ruby performance

                    • Good transition from Rails/Postgres -> Rails/NoSQL

                    • Actively developed




Thursday, September 23, 2010
Goal
                    • Scores (@ 10 Mio Daily Active Users)

                          • 10 Mio Scores/day == 350 inserts/second

                          • around same read rate for Leaderboards

                    • Game with 10 Mio Players

                          • Leaderboard with 10 Mio entries

                    • Session (@ 10 Mio DAU)

                          • > 10 Mio session handshakes/day



Thursday, September 23, 2010
Data Patterns
                    • Most data is accessed time based (the most recent
                      data is accessed the most often)

                    • Write-Read rate is around the same

                    • Eventually consistency is good enough most of the
                      time




Thursday, September 23, 2010
Rating criteria
                    •     Type (Document Store, Key/Value Store, Big Table)

                    •     Deployment

                          •    How easy is it to scale?

                    •     Existing installations

                          •    How big are known installations?

                    •     Heritage and activity

                          •    Where does the solution come from and how actively is it
                               developed by whom?




Thursday, September 23, 2010
Products evaluated
                    • MongoDB

                    • Redis

                    • Cassandra

                    • HBase

                    • Membase




Thursday, September 23, 2010
MongoDB
                    • document store

                    • “SQL DB” without relations

                    • easy transition with MongoMapper, Mongoid

                    • supports sharding over replication sets (since August
                      2010)

                    • Haven’t found a big shareded server installation




Thursday, September 23, 2010
Experience with Mongo
          • nice/easy to program with

          • deployment woes we’ve encountered (1.6.0)

                • segmentation fault

                • cannot read beacuse: invalid BSON object

                • when index is > RAM performance degradation (from
                  20ms to 200 ms for queries)

                • Global write lock makes data migrations slow




Thursday, September 23, 2010
Cassandra
                    •     Big Table data store

                    •     Was developed by Facebook and is actively maintained

                    •     Easy to add servers and to setup (peer to peer concept)

                    •     Thrift API to Ruby was slow in tests (Our tests: around 150 write
                          ops/second)

                    •     Avro API promises to be faster (will be an option in 0.7)

                    •     Used by Facebook

                    •     Not using it because it is too slow with ruby




Thursday, September 23, 2010
Redis
                    • Memcache with simple persistence

                    • Supports many different data types and atomic
                      operations on them

                    • Sharding is done client side (difficult to add new
                      servers)

                    • We’re using it for indexes on SQL data

                    • Very fast (Our tests: 4000 write operations/second)



Thursday, September 23, 2010
HBase
                    • Big Table Database

                    • Complex to setup and to maintain

                    • Very often used for Analytics Jobs with Hadoop/HIVE
                      e.g as Amazon EC2 Elastic Map Reduce

                    • For Analytics also look at Scribe for data collection




Thursday, September 23, 2010
Membase
                    • Key-Value Store

                    • Distributed, persistent Memcache

                    • Easy to add nodes

                    • Used by Zynga




Thursday, September 23, 2010
Example Leaderboards
                    • User has many scores

                    • Each score has one result (integer)

                    • Game has many scores

                    •      Query: the leaderboard for one game

                          • Insert one score into the leaderboard

                          • What is my rank?

                          • Give me 10 scores starting at position 100,000



Thursday, September 23, 2010
SQL vs NoSQL
                    • Think about Data         • Think about Queries

                    • Redundancy is bad        • Redundancy is ok

                    • Indexes are managed by   • Roll your own indexes
                      the DB                     depending on queries

                    • Query over relations     • No Joins and connecting
                                                 entities
                    • Always exact results
                                               • Query results don’t have to
                                                 return latest write
                                                 operation



Thursday, September 23, 2010
SQL vs NoSQL
                    • standardized query   • some solutions share
                      language and DDL       standards

                    • All DBs are “the     • Many different
                      same”                  approaches

                                             • Document store

                                             • Big Table

                                             • Key Value



Thursday, September 23, 2010
Postgres
                                         1      n           n   1
                                  User              Score           Game




                    •     Create new score:
                          Score.new(attributes)
                          Score.save => insert into scores;

                    •     What is my rank?
                          select count(*) from scores inner join games on (games.id =
                          scores.game_id)
                          where result > #{my_score.result} and games.name = #{game_name}
                          order by result desc

                    •     Give me 10 scores in leaderboard from position 100000
                          select * from scores inner join games on (games.id = scores.game_id)
                          order by result desc
                          offset 100000 limit 10;




Thursday, September 23, 2010
Redis
    SortedSet
                                                                          • New Score
    key: game_name
    score: result
    value: score_id
                                                                            redis.zadd(“Jewels”,
      key: "Jewels"
                                                                            result, score_id)
             100            99            96
           <2563>        <96877>        <6752>
                                                       ...                • My Rank?
      key: "Bug Landing"                                                    redis.zrevrank("Jewels",
      key: "Toss It"                                                        result)
     ...

                                                                          • 10 scores from position 100000
    KeyValue Store

    key: score_id
                                                                            redis.zrevrange(“Jewels”,
    value: marshalled score object
                                                                            100000, 10)
              2563: { result : 100, user_id : 52345, game_id: 57142 }
                96877: { result : 99, user_id : 2541, game_id: 57142 }
                9752: { result : 96, user_id : 3652, game_id: 57142 }




Thursday, September 23, 2010
Mongo
                                 Collection

                                 key: Scores


                                       { _id: 2563, result : 100, user_id : 52345, game_id: 57142 }
                                       { _id: 96877, result : 99, user_id : 2541, game_id: 57142 }
                                        { _id: 6752, result : 96, user_id : 3652, game_id: 57142 }




                    •     New Score
                          Score.create!(attributes)
                          db.scores.insert( { result: 100, user_id: 52345,
                          game_id: 57142 } )

                    •     What is my rank?
                          db.scores.count( { result: { $gt: #{my_score.result} }})

                    •     10 scores from position 100000
                          db.scores.find({}).sort({ result: -1 }).skip
                          (100000).limit(10)




Thursday, September 23, 2010
Cassandra
    ColumFamily: Leaderboards                          ColumFamily: Scores

    row_key: game_name                                 row_key: score_id




       row_key: "Jewels"                                  row_key: 2563

                                                               game_id: 57142   result: 100   user_id: 6325
            100: 2563       99: 96877   96: 6752

                                                          row_key: 96877
       row_key: "Bug Landing"
                                                               game_id: 57142   result: 99    user_id: 2375

       row_key: "Toss It"
                                                          row_key: 6752
      ...
                                                               game_id: 57142   result: 96    user_id: 2311
                                                         ...




Thursday, September 23, 2010
ColumFamily: Leaderboards

                                                     row_key: game_name




                Cassandra                              row_key: "Jewels"


                                                            100: 2563       99: 96877


                                                       row_key: "Bug Landing"
                                                                                        96: 6752




                                                       row_key: "Toss It"


                    • Insert new score:               ...


                          client.insert(“ScoreList”, “Jewels”, result => id)
                          client.insert(id, :result => result, :user_id =>
                          user_id, :game_id => game_id)


                    • What is my rank?
                      => not easy, need help from other tools

                    • Give me the next 10 scores starting at score X
                          client.get(“ScoreList”, “Jewels”, :start =>
                          X.result, count => 10)




Thursday, September 23, 2010
Findings
                    • Use and test the tools you want to use on the scale
                      you are going to use them

                    • There is no “Best NoSQL” solution

                    • Mix and match the tools you need

                    • NoSQL requires a lot of rethinking and change in
                      your Ruby Code.




Thursday, September 23, 2010
Links
                    •     Cassandra: http://cassandra.apache.org/

                    •     Cassandra API: http://wiki.apache.org/cassandra/API

                    •     Twitter on Cassandra: http://github.com/ericflo/twissandra

                    •     Redis: http://code.google.com/p/redis/

                    •     Redis API: http://code.google.com/p/redis/wiki/CommandReference

                    •     Membase: http://www.membase.org/

                    •     HBase: http://hbase.apache.org/

                    •     Scribe: http://github.com/facebook/scribe

                    •     Mongo: http://www.mongodb.org/




Thursday, September 23, 2010

Más contenido relacionado

Similar a No sql findings

NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social WebBogdan Gaza
 
Yet Another Replication Tool: RubyRep
Yet Another Replication Tool: RubyRepYet Another Replication Tool: RubyRep
Yet Another Replication Tool: RubyRepDenish Patel
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQLRTigger
 
Got documents Code Mash Revision
Got documents Code Mash RevisionGot documents Code Mash Revision
Got documents Code Mash RevisionMaggie Pint
 
NoSQL for SQL Server Developers using Couchbase
NoSQL for SQL Server Developers using CouchbaseNoSQL for SQL Server Developers using Couchbase
NoSQL for SQL Server Developers using CouchbaseBrant Burnett
 
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevWebinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevAltinity Ltd
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Bob Pusateri
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLRichard Schneeman
 
JasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max SchiresonJasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max SchiresonMongoDB
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureArthur Gimpel
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQLTony Tam
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersNiko Neugebauer
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Bob Pusateri
 

Similar a No sql findings (20)

NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social Web
 
MongoDB
MongoDBMongoDB
MongoDB
 
Yet Another Replication Tool: RubyRep
Yet Another Replication Tool: RubyRepYet Another Replication Tool: RubyRep
Yet Another Replication Tool: RubyRep
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 
Drop acid
Drop acidDrop acid
Drop acid
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
 
Got documents Code Mash Revision
Got documents Code Mash RevisionGot documents Code Mash Revision
Got documents Code Mash Revision
 
NoSQL for SQL Server Developers using Couchbase
NoSQL for SQL Server Developers using CouchbaseNoSQL for SQL Server Developers using Couchbase
NoSQL for SQL Server Developers using Couchbase
 
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevWebinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
 
Got documents?
Got documents?Got documents?
Got documents?
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQL
 
No sql Database
No sql DatabaseNo sql Database
No sql Database
 
JasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max SchiresonJasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max Schireson
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data Architecture
 
Scalable web architecture
Scalable web architectureScalable web architecture
Scalable web architecture
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
NOsql Presentation.pdf
NOsql Presentation.pdfNOsql Presentation.pdf
NOsql Presentation.pdf
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & Developers
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
 

No sql findings

  • 1. NoSQL Findings Christian van der Leeden Thursday, September 23, 2010
  • 2. Our problem • Growth is not linear and not predictable • e.g. History::Session table now > 30 Mio entries • Activities > 26 Mio entries • Postgres will be the performance bottleneck Thursday, September 23, 2010
  • 3. Criteria • Allow us to scale from 100k Daily Active Users (DAU) to 1 Mio DAU up to 10Mio DAU • Scale horizontally (“Just add servers”) • Good ruby performance • Good transition from Rails/Postgres -> Rails/NoSQL • Actively developed Thursday, September 23, 2010
  • 4. Goal • Scores (@ 10 Mio Daily Active Users) • 10 Mio Scores/day == 350 inserts/second • around same read rate for Leaderboards • Game with 10 Mio Players • Leaderboard with 10 Mio entries • Session (@ 10 Mio DAU) • > 10 Mio session handshakes/day Thursday, September 23, 2010
  • 5. Data Patterns • Most data is accessed time based (the most recent data is accessed the most often) • Write-Read rate is around the same • Eventually consistency is good enough most of the time Thursday, September 23, 2010
  • 6. Rating criteria • Type (Document Store, Key/Value Store, Big Table) • Deployment • How easy is it to scale? • Existing installations • How big are known installations? • Heritage and activity • Where does the solution come from and how actively is it developed by whom? Thursday, September 23, 2010
  • 7. Products evaluated • MongoDB • Redis • Cassandra • HBase • Membase Thursday, September 23, 2010
  • 8. MongoDB • document store • “SQL DB” without relations • easy transition with MongoMapper, Mongoid • supports sharding over replication sets (since August 2010) • Haven’t found a big shareded server installation Thursday, September 23, 2010
  • 9. Experience with Mongo • nice/easy to program with • deployment woes we’ve encountered (1.6.0) • segmentation fault • cannot read beacuse: invalid BSON object • when index is > RAM performance degradation (from 20ms to 200 ms for queries) • Global write lock makes data migrations slow Thursday, September 23, 2010
  • 10. Cassandra • Big Table data store • Was developed by Facebook and is actively maintained • Easy to add servers and to setup (peer to peer concept) • Thrift API to Ruby was slow in tests (Our tests: around 150 write ops/second) • Avro API promises to be faster (will be an option in 0.7) • Used by Facebook • Not using it because it is too slow with ruby Thursday, September 23, 2010
  • 11. Redis • Memcache with simple persistence • Supports many different data types and atomic operations on them • Sharding is done client side (difficult to add new servers) • We’re using it for indexes on SQL data • Very fast (Our tests: 4000 write operations/second) Thursday, September 23, 2010
  • 12. HBase • Big Table Database • Complex to setup and to maintain • Very often used for Analytics Jobs with Hadoop/HIVE e.g as Amazon EC2 Elastic Map Reduce • For Analytics also look at Scribe for data collection Thursday, September 23, 2010
  • 13. Membase • Key-Value Store • Distributed, persistent Memcache • Easy to add nodes • Used by Zynga Thursday, September 23, 2010
  • 14. Example Leaderboards • User has many scores • Each score has one result (integer) • Game has many scores • Query: the leaderboard for one game • Insert one score into the leaderboard • What is my rank? • Give me 10 scores starting at position 100,000 Thursday, September 23, 2010
  • 15. SQL vs NoSQL • Think about Data • Think about Queries • Redundancy is bad • Redundancy is ok • Indexes are managed by • Roll your own indexes the DB depending on queries • Query over relations • No Joins and connecting entities • Always exact results • Query results don’t have to return latest write operation Thursday, September 23, 2010
  • 16. SQL vs NoSQL • standardized query • some solutions share language and DDL standards • All DBs are “the • Many different same” approaches • Document store • Big Table • Key Value Thursday, September 23, 2010
  • 17. Postgres 1 n n 1 User Score Game • Create new score: Score.new(attributes) Score.save => insert into scores; • What is my rank? select count(*) from scores inner join games on (games.id = scores.game_id) where result > #{my_score.result} and games.name = #{game_name} order by result desc • Give me 10 scores in leaderboard from position 100000 select * from scores inner join games on (games.id = scores.game_id) order by result desc offset 100000 limit 10; Thursday, September 23, 2010
  • 18. Redis SortedSet • New Score key: game_name score: result value: score_id redis.zadd(“Jewels”, key: "Jewels" result, score_id) 100 99 96 <2563> <96877> <6752> ... • My Rank? key: "Bug Landing" redis.zrevrank("Jewels", key: "Toss It" result) ... • 10 scores from position 100000 KeyValue Store key: score_id redis.zrevrange(“Jewels”, value: marshalled score object 100000, 10) 2563: { result : 100, user_id : 52345, game_id: 57142 } 96877: { result : 99, user_id : 2541, game_id: 57142 } 9752: { result : 96, user_id : 3652, game_id: 57142 } Thursday, September 23, 2010
  • 19. Mongo Collection key: Scores { _id: 2563, result : 100, user_id : 52345, game_id: 57142 } { _id: 96877, result : 99, user_id : 2541, game_id: 57142 } { _id: 6752, result : 96, user_id : 3652, game_id: 57142 } • New Score Score.create!(attributes) db.scores.insert( { result: 100, user_id: 52345, game_id: 57142 } ) • What is my rank? db.scores.count( { result: { $gt: #{my_score.result} }}) • 10 scores from position 100000 db.scores.find({}).sort({ result: -1 }).skip (100000).limit(10) Thursday, September 23, 2010
  • 20. Cassandra ColumFamily: Leaderboards ColumFamily: Scores row_key: game_name row_key: score_id row_key: "Jewels" row_key: 2563 game_id: 57142 result: 100 user_id: 6325 100: 2563 99: 96877 96: 6752 row_key: 96877 row_key: "Bug Landing" game_id: 57142 result: 99 user_id: 2375 row_key: "Toss It" row_key: 6752 ... game_id: 57142 result: 96 user_id: 2311 ... Thursday, September 23, 2010
  • 21. ColumFamily: Leaderboards row_key: game_name Cassandra row_key: "Jewels" 100: 2563 99: 96877 row_key: "Bug Landing" 96: 6752 row_key: "Toss It" • Insert new score: ... client.insert(“ScoreList”, “Jewels”, result => id) client.insert(id, :result => result, :user_id => user_id, :game_id => game_id) • What is my rank? => not easy, need help from other tools • Give me the next 10 scores starting at score X client.get(“ScoreList”, “Jewels”, :start => X.result, count => 10) Thursday, September 23, 2010
  • 22. Findings • Use and test the tools you want to use on the scale you are going to use them • There is no “Best NoSQL” solution • Mix and match the tools you need • NoSQL requires a lot of rethinking and change in your Ruby Code. Thursday, September 23, 2010
  • 23. Links • Cassandra: http://cassandra.apache.org/ • Cassandra API: http://wiki.apache.org/cassandra/API • Twitter on Cassandra: http://github.com/ericflo/twissandra • Redis: http://code.google.com/p/redis/ • Redis API: http://code.google.com/p/redis/wiki/CommandReference • Membase: http://www.membase.org/ • HBase: http://hbase.apache.org/ • Scribe: http://github.com/facebook/scribe • Mongo: http://www.mongodb.org/ Thursday, September 23, 2010