SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
Harri Kauhanen
2010-03-09




   NoSQL databases
Database paradigms

• Relational (RDBMS)
• NoSQL
 •   Key-value stores

 •   Document databases

 •   Wide column stores (BigTable and clones)

 •   Graph databases


• Others
Relational databases

• ACID (Atomicity, Consistency, Isolation and
  Durability)
• SQL
• MySQL, PostreSQL, Oracle, ...
dog                                bark
       id: integer                        id: integer
     name: varchar                      dog_id: integer
     mood: varchar                         bark: text
     birthdate: date
      color: enum
                                           comment
                                          id: integer
                                        bark_id: integer
                                        dog_id: integer
                                        comment: text




id                   name        mood        birth_date            color
         12 Stella          Happy         2007-04-01       NULL
         13 Wimma           Hungry        NULL             black
          9 Ninja           NULL          NULL             NULL
Key-value stores
• “One key, one value, no duplicates, and
  crazy fast.”
• It is a Hash!
• The value is binary object aka. “blob” – the
  DB does not understand it and does not
  want to understand it.
• Amazon Dynamo, MemcacheDB, ...
“value”
 “key”
                 ...
         name_€#_Stella^^^
         mood_€#_Happy^^^
dog_12     birthdate%///
             135465645)
                 ...
Document databases

• Key-value store, but the value is (usually)
  structured and “understood” by the DB.
• Querying data is possible (by other means
  than just a key).
• Amazon SimpleDB, CouchDB, MongoDB,
  Riak, ...
“document”
      “key”

                          {
                              type: “Dog”,
                              name: “Stella”,
     dog_12                   mood: “Happy”,
                              birthdate: 2007-04-01
                          }




                        vs.
id            name        mood       birth_date           color
      12 Stella      Happy        2007-04-01      NULL
      13 Wimma       Hungry       NULL            black
       9 Ninja       NULL         NULL            NULL
{
             type: “Dog”,
             name: “Stella”,
             mood: “Happy”,
             birthdate: 2007-04-01,
             barks: [
               {
                 bark: “I had to wear stupid..”
                 comments: [
                   {
dog_12                dog_id: “dog_4”,
                      comment: “You look so cute!”
                   }, {
                      dog_id: “dog_14”,
                      comment: “I hate it, too!”
                   }
                 ]
               }
             ]
         }
Wide column stores

• Often referred as “BigTable clones”
• "a sparse, distributed multi-dimensional
  sorted map"
• Google BigTable, Cassandra (Facebook),
  HBase, ...
“column”

“row-id” “column family”      “title”        “time”       “value”



dog_12              dog    birthdate 15          2007-04-01
                    dog     mood        11       Angry
                    dog     mood        45       Happy
                    dog     name        25       Stella
                    dog     color       34       Black

                   bark      text       11       I had to wear...
Graph databases


• “Relation database is a collection loosely
  connected tables” whereas “Graph
  database is a multi-relational graph”.
• Neo4j, InfoGrid, ...
Dog



 Stella
                           type

              name


              mood        dog_12        barks    bark_59         I had to wear stupid...
 Happy

             birth_date
                                           comment_to


2007-04-01
                                    comment_83             You look so Cute



                             comments



                  dog_4
• Relationships in RDBMS are “weak”.
  •   You may “define” one by using constraints,
      documenting a relationship, writing code, using
      naming conventions etc.

• Relationships in graph databases are first
  class citizens.
• There are no relationships in key-value
  stores, document databases and wide
  column stores.
  •   You may “define” one by using validations,
      documenting a relationship, writing code, using
      naming conventions etc.
• Relational databases have almost limitless
  indexing, and a very strong language for
  dynamic, cross-table, queries (SQL)
  •   That’s why they handle all kinds of relationships
      well and dynamically.

• NoSQL databases...
  •   ...might have limited support for dynamic queries
      and indexing

  •   ...don’t support JOIN like operations of SQL

  •   ...but you can store some relationships into
      document itself
How to query NoSQL?
• Key-Value
• Row-id/column-family:title[/time]
  •   “stella_12”/”dogs”:”name” → Stella

• Graph traversal
• API
• Query-language
• Integration to indexing and search engines
• Map-Reduce
Map-Reduce
• “MapReduce is a programming model and
  an associated implementation for
  processing and generating large data sets.”
• Often JavaScript (NoSQL implementations)
Map-function
  function map(doc) {
    if (doc['type'] == 'Dog') {
      emit(doc['mood'], doc['birthdate']);
    }
  }


• Generates “indexed view” of data/
  documents
• This view is just another hash, but both key
  and value can be “anything”
Reduce-function
• Aggregate results for a “view” (after the
  Map-function)

function reduce(mood, listOfBirthdates) {
  return averageBirthDate(listOfBirthdates);
}



• Map-phase is easy to distribute, but you is also
  easy to write poor Reduce-functions
Theorems
• CAP
 • Consistency,
    Availability,
    Partition tolerance
 • “Pick two”
• N/R/W (adjusting CAP)
No consistency?
Eventual consistency
Why NoSQL?
• Schema-free
• Massive data stores
• Scalability
• Some services simpler to implement than
  using RDBMS
• Great fit for many “Web 2.0” services
Why NOT NoSQL

• DRBMS databases and tools are mature
• NoSQL implementations often “alpha”
• Data consistency, transactions
• “Don’t scale until you need it”
RDBMS vs. NoSQL

• Strong consistency vs. Eventual consistency
• Big dataset vs. HUGE datasets
• Scaling is possible vs. Scaling is easy
• SQL vs. Map-Reduce
• Good availability vs.Very high availability
Questions?

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
SQL vs. NoSQL Databases
SQL vs. NoSQL DatabasesSQL vs. NoSQL Databases
SQL vs. NoSQL Databases
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
 
Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL Database
 
Mongo db intro.pptx
Mongo db intro.pptxMongo db intro.pptx
Mongo db intro.pptx
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
 
Postgresql
PostgresqlPostgresql
Postgresql
 
NoSQL
NoSQLNoSQL
NoSQL
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
Sql vs NoSQL-Presentation
 Sql vs NoSQL-Presentation Sql vs NoSQL-Presentation
Sql vs NoSQL-Presentation
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Nosql
NosqlNosql
Nosql
 
MongoDB
MongoDBMongoDB
MongoDB
 
Non Relational Databases
Non Relational DatabasesNon Relational Databases
Non Relational Databases
 
NoSql
NoSqlNoSql
NoSql
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 

Último

Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Valere | Digital Solutions & AI Transformation Portfolio | 2024
Valere | Digital Solutions & AI Transformation Portfolio | 2024Valere | Digital Solutions & AI Transformation Portfolio | 2024
Valere | Digital Solutions & AI Transformation Portfolio | 2024Alexander Turgeon
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimization100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimizationarrow10202532yuvraj
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdfPaige Cruz
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
IEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK GuideIEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK GuideHironori Washizaki
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"DianaGray10
 

Último (20)

Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Valere | Digital Solutions & AI Transformation Portfolio | 2024
Valere | Digital Solutions & AI Transformation Portfolio | 2024Valere | Digital Solutions & AI Transformation Portfolio | 2024
Valere | Digital Solutions & AI Transformation Portfolio | 2024
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimization100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimization
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
IEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK GuideIEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
 

NoSQL databases

  • 1. Harri Kauhanen 2010-03-09 NoSQL databases
  • 2. Database paradigms • Relational (RDBMS) • NoSQL • Key-value stores • Document databases • Wide column stores (BigTable and clones) • Graph databases • Others
  • 3. Relational databases • ACID (Atomicity, Consistency, Isolation and Durability) • SQL • MySQL, PostreSQL, Oracle, ...
  • 4. dog bark id: integer id: integer name: varchar dog_id: integer mood: varchar bark: text birthdate: date color: enum comment id: integer bark_id: integer dog_id: integer comment: text id name mood birth_date color 12 Stella Happy 2007-04-01 NULL 13 Wimma Hungry NULL black 9 Ninja NULL NULL NULL
  • 5. Key-value stores • “One key, one value, no duplicates, and crazy fast.” • It is a Hash! • The value is binary object aka. “blob” – the DB does not understand it and does not want to understand it. • Amazon Dynamo, MemcacheDB, ...
  • 6. “value” “key” ... name_€#_Stella^^^ mood_€#_Happy^^^ dog_12 birthdate%/// 135465645) ...
  • 7. Document databases • Key-value store, but the value is (usually) structured and “understood” by the DB. • Querying data is possible (by other means than just a key). • Amazon SimpleDB, CouchDB, MongoDB, Riak, ...
  • 8. “document” “key” { type: “Dog”, name: “Stella”, dog_12 mood: “Happy”, birthdate: 2007-04-01 } vs. id name mood birth_date color 12 Stella Happy 2007-04-01 NULL 13 Wimma Hungry NULL black 9 Ninja NULL NULL NULL
  • 9. { type: “Dog”, name: “Stella”, mood: “Happy”, birthdate: 2007-04-01, barks: [ { bark: “I had to wear stupid..” comments: [ { dog_12 dog_id: “dog_4”, comment: “You look so cute!” }, { dog_id: “dog_14”, comment: “I hate it, too!” } ] } ] }
  • 10. Wide column stores • Often referred as “BigTable clones” • "a sparse, distributed multi-dimensional sorted map" • Google BigTable, Cassandra (Facebook), HBase, ...
  • 11. “column” “row-id” “column family” “title” “time” “value” dog_12 dog birthdate 15 2007-04-01 dog mood 11 Angry dog mood 45 Happy dog name 25 Stella dog color 34 Black bark text 11 I had to wear...
  • 12. Graph databases • “Relation database is a collection loosely connected tables” whereas “Graph database is a multi-relational graph”. • Neo4j, InfoGrid, ...
  • 13. Dog Stella type name mood dog_12 barks bark_59 I had to wear stupid... Happy birth_date comment_to 2007-04-01 comment_83 You look so Cute comments dog_4
  • 14. • Relationships in RDBMS are “weak”. • You may “define” one by using constraints, documenting a relationship, writing code, using naming conventions etc. • Relationships in graph databases are first class citizens. • There are no relationships in key-value stores, document databases and wide column stores. • You may “define” one by using validations, documenting a relationship, writing code, using naming conventions etc.
  • 15. • Relational databases have almost limitless indexing, and a very strong language for dynamic, cross-table, queries (SQL) • That’s why they handle all kinds of relationships well and dynamically. • NoSQL databases... • ...might have limited support for dynamic queries and indexing • ...don’t support JOIN like operations of SQL • ...but you can store some relationships into document itself
  • 16. How to query NoSQL? • Key-Value • Row-id/column-family:title[/time] • “stella_12”/”dogs”:”name” → Stella • Graph traversal • API • Query-language • Integration to indexing and search engines • Map-Reduce
  • 17. Map-Reduce • “MapReduce is a programming model and an associated implementation for processing and generating large data sets.” • Often JavaScript (NoSQL implementations)
  • 18. Map-function function map(doc) { if (doc['type'] == 'Dog') { emit(doc['mood'], doc['birthdate']); } } • Generates “indexed view” of data/ documents • This view is just another hash, but both key and value can be “anything”
  • 19. Reduce-function • Aggregate results for a “view” (after the Map-function) function reduce(mood, listOfBirthdates) { return averageBirthDate(listOfBirthdates); } • Map-phase is easy to distribute, but you is also easy to write poor Reduce-functions
  • 20. Theorems • CAP • Consistency, Availability, Partition tolerance • “Pick two” • N/R/W (adjusting CAP)
  • 23. Why NoSQL? • Schema-free • Massive data stores • Scalability • Some services simpler to implement than using RDBMS • Great fit for many “Web 2.0” services
  • 24. Why NOT NoSQL • DRBMS databases and tools are mature • NoSQL implementations often “alpha” • Data consistency, transactions • “Don’t scale until you need it”
  • 25. RDBMS vs. NoSQL • Strong consistency vs. Eventual consistency • Big dataset vs. HUGE datasets • Scaling is possible vs. Scaling is easy • SQL vs. Map-Reduce • Good availability vs.Very high availability

Notas del editor

  1. Hello... Today, I’m going to talk about one of the latest buzz words, so called “NoSQL” databases. A little disclaimer: I personally have real-life experience with only one NoSQL database called CouchDB. I use CouchDB at my Haukut.fi service. So, why did I want to talk about the subject: I wanted to learn this stuff myself. What options there are to CouchDB and should I perhaps pick another option for my next generation Haukut.fi. Also, I strongly feel that relational databases are going to be, if not replaced, they will at least get strong competition from these NoSQL solutions. This will not happen within all domains and services, but for simple consumer web services this will eventually happen.
  2. Ok, lets quickly review the database paradigms we have to choose from. We have relational databases -- the ones you use here at Futurice every day in most of our projects. Then there is NoSQL. NoSQL a relatively new term now getting hot and popular on blogsphere and Twitter. Like many buzzwords out there, NoSQL does not have a definite definition. It could refer to all those databases where you don’t have to use SQL. Some more friendly and wise people would put it more softly “not only SQL”. Well, I am not even going to try to give a definition, but instead I will list what kind of data stores are usually categorized under this umbrella term. They are: ... These I am going to talk about today, but then there are also others such as Object Databases, and they are out of the scope of this presentation.
  3. Let’s start with something familiar. The great promise of relational databases is they are ACID. They all support a very strong query language called SQL. And these are some familiar examples of relation database implementations.
  4. This could be a visual representation of a relational database and the relationships between the tables. We have dogs. A dog must have a name and it may have mood, birthdate and a color. We all know that dogs can really speak, and they speak by barking. Dogs may also comment barks made by the others. Quite simple. The content of table ‘Dog’ could be something like this. We can see that Stella is a happy dog and has birth on April fool’s day. Nothing special here, either. You know the stuff.
  5. Ok, that was quick. So, let’s get into the business :) The simplest form of NoSQL are the Key-value stores. They are really simple, you could say “One key...” All in all, a key value-store is just a persistent Hash. The value-part can be anything. The key-value store does not care a bit about the content of it. Example implementations. Amazon Dynamo is very important one, because those Amazon guys published some theoretical material, and other NoSQL databases are partly based on these writings.
  6. If you want to visualize it, you can see that we have a string based key, and that the value could be a serialized Java-object, or anything at all.
  7. Document databases. The basic idea is still quite simple. They are key-value stores, but the difference is that the value... Because of this, querying of data is somehow possible. There might by a query language like SQL, but that is not actually very common. The examples of document databases are...
  8. What is the difference between this picture and the one two slides earlier? The “value” is now called “document”. That’s it!? Well, the document here has a structure. The structure could be JSON, XML or anything. Having a structure means that we might do something with the data. Not just return a binary blob. But having a structure does not mean it should have a predefined structure. With relational databases you have to define a schema before you can store data. If you compare this document here with our relational example, you can see that in the document we do not have a “color”. It could be there, but it does not need to be there. On the other hand, if we want to add a new attribute, we just add it. With relational database we need to adjust the structure of the table, and that’s not always so pleasant. What about relationships? Well, if the relationship is strong enough, you could add it as a part of the document itself. You could do...
  9. ...this. Here, barks and comments are just a part of the dog-document. I would not say that this would be a smart move, because the document might become huge. In Haukut.fi -service “dog” is one document, and “bark” is another. But comments made to a bark ARE part of a bark document.
  10. The next category does not have widely accepted name. Some would call them “wide column stores” and others might say they are “BigTable clones”. Whatever you call them, they ideologically reside somewhere between key-value stores and relational databases. There is no schema, but the data is still semi-structured. Like in relational databases, you could think that there are rows, but they can have any number of columns, and there is no need to store NULL values. Again, if you think of it in terms of relational databases, and talk about “rows” and “tables”, you might just get more confused. I still am a bit confused myself :-) This is one definition... It might not get you any wiser. Hopefully an example will help, but before that let’s see the example implementations. Cassandra is probably the most interesting one, because it is getting a lot of attention. And the reference as the store under the hoods of Facebook is quite good reference, or what do you think :-)
  11. Ok, here’s the example I promised. I could not figure out the best way to depict this, but let’s hope this works. Like in the previous examples, we have a dog called Stella. Stella’s ID is dog_12 and Stella has a number of attributes such as birthdate, mood and name. Internally, we could store this information into a table structure, much like in relational database. Now, if we want to add an attribute “color” to Stella, we can do it easily. Ok, let’s look at terms on the top of the picture. “Row-id” is the “key” to the stored item. ”Column family” and “title” together form a “column”. The difference between these is that the “title” is dynamic and you can define new titles on the fly. But the column family is more or less fixed. It is a bit like “table” in relational database and it is costly to update it’s name, for instance. The “time” is simply the version of an attribute. If we look at the attribute “mood” here, you will see that Stella has been angry from time 11 on, and she became happy at time 45. If you query an attribute without time, you would simply get the latest value. A record is not tied to a single column family. Like in the document database example, I could say “barks are attributes of a dog”. Like this. Perhaps this explains why they are called “wide column stores” as a record can easily consist of a large number of attributes.
  12. Then there are Graph databases. Someone would perhaps leave them out from the “NoSQL” family, but the authors themselves shout that they should belong to the hype. I don’t really know too much about them. The Neo4j seems to be most popular one.
  13. Here’s an example how a graph database could look like. Again, there is our dog_12 having a name, mood, birthdate and so on. The relationships within graph databases are strong. What I mean to say is...
  14. Relationships in... There is no REAL relationship between the tables, you may... In graph databases, however, the relationships are... It means that you can do very efficient calculations you might need in some social applications. I’m talking about friends and friends-of-friends here. What about the other NoSQL databases. You could say that... BUT just like in relational databases, you may... The only difference between these sentences is that here I talk about “constraints” and here “validations”... and again, they are, in a sense, the same thing. Why, then, relational databases are so good handling relationships? It is because they have...
  15. ...SQL. And great support for indexing! NoSQL databases, on the other hand, might... And they usually don’t... But, on positive side, you... However, compared to the power SQL, isn’t this quite disappointing?
  16. Well, how do you query NoSQL then? Of course, you can access a document with a key you know. ...and with a BigTable also query per attribute like this. With graph databases you do “graph traversals”. Usually there is an API which might support queries such as “give me all the dogs”. Some NoSQL solutions also provide an SQL-like query language. Then you can integrate to various search engines and some NoSQL solutions may provide integration out-of-the box. ...but the most common way seems to be to use Map-Reduce functions.
  17. When Google popularized the term Map-Reduce, the first sentence of the publication was: “...” The big idea is that without understanding about parallel and distributed programming, users may write quite simple Map and Reduce functions to solve any kinds of problems. These functions can then be run in a big cluster of processes and computers. These Map and Reduce functions could be written with any language, but quite often NoSQL databases provide a JavaScript based solution. The main point is not the language. Rather you just need a means to process a record, and then a way to return processed data back to the pipeline.
  18. A very simple example of a Map function could be this. The parameter passed to the Map-function is always a single document. The example here checks that given document is a dog, and if it is, it will return dog’s mood as a key and birthdate as value. Now, think that we have 10 million dogs in our database. It will be very easy to distribute this function because given a single document, you should always get the same key/value pair back. It is a little bit like rendering an animated 3D movie, where you distribute the rendering of each individual frame to number of workers, and the result will be frame number as a key and the bitmap as value. From NoSQL databases point of view, you may use Map-functions to generate views for your data. Internally the database caches the results so that the data will be very fast to access. With this map function defined as a view, I could easily find the birth dates of happy dogs.
  19. Reduce function exists to aggregate results from a dataset. A common scenario would be to calculate sum of values. Here’s an example of quite stupid reduce function. You always pass two parameters to reduce phase: a key and values for the given key. This reduce function would simply calculate the average ages of happy dogs, angry dogs, bored dogs and so on. And as a footnote, you know it is easy to write poorly performing SQL, but you can easily write poor reduce functions as well.
  20. If you want to get deeper into the subject, you should probably google this keyword: CAP Shortly, CAP theorem says that: if you have a distributed data system, you can only get two out of these three features. Consistency: meaning that once you write something into the database, all the readers of the system will get the same result. Availability: meaning the database will function even if a single component in the system fails. Partition tolerance is easiest to explain using an example. Imagine we have a database distributed that have nodes all over Europe and United states. Now, if the network connections suddenly disappear between the continents, we will have two partitions. If the system is still able to operate normally allowing both reads and writes, and is able to recover once the connections between Europe and USA works again, the system is called “partition tolerant”. If you think of a relational database there is no such concept as partition tolerance at all. However, relational databases are very consistent and can be quite highly available. Most NoSQL databases focus on providing very high availability. Most of the systems are partition tolerant, but there are also system providing consistency over partition tolerance. Then, there are some databases that let you decide the level of CAP you want. If you are interested, you could try googling N/R/W.
  21. Ok, doesn’t it sound quite bad if a database is not consistent? If you write your comment into Facebook, some of your friends will see it immediately and others only after couple of seconds? The database designers (and I) think that consistency is not always top priority. What matters is that the data will be..
  22. ..eventually consistent. Eventual consistency means that the consistent state may happen instantly, after a few seconds or after a network connections have been restored. But it will eventually happen. If you get better performance, better availability and better scalability, the ACID level consistency might not be that important after all. And of course, this depends on the domain and the application you are doing. Eventual consistency might not be a good idea when you are transacting money or doing other such critical task.
  23. Why you should thing a NoSQL solution for your next project? Being schema-free is a HUGE help for many problems. And it almost always makes the life of developer much more pleasant. If you need to store massive amount of data, or know you need to scale easily, NoSQL might be for you. I could also claim that some services are perhaps easier to implement with a NoSQL solution that with a relational database. And this very true for many simple Web 2.0 services.
  24. Then, why NoSQL is not always the best choice. First, relational databases have been around for many decades, and they are mature. The developer tools are mature. NoSQL solutions, on the other hand, are often new projects with great ideas and even greater promises, but only “alpha” quality. If you need strong data consistency, or if you need uncompromised support for transactions, use a relational database. And the last point. We often tend to think of scalability issues too early. It might not be a bad choice to write an application with the tools you know best, and when the time hits, and you need to scale, then think if a NoSQL solution could solve some of your scalability issues.
  25. This is my last slide. If you want to compare relational databases with NoSQL databases, these could be the main points. We talked about consistency. In relational model, with ACID operation, it is very strong. Relational databases support rather big datasets, but some NoSQL give you almost unlimited scalability in terms of data size. Scaling can mean many things, but you could safely say that NoSQL solutions usually scale up much better than relational databases do. SQL is wonderful query language and NoSQL solutions often do not support any kind of query language. If they do, the languages are likely not as expressive as SQL is. On the other hand, Map-Reduce can solve some problems very efficiently. And in theory at least, high availability is also easier achieved with many NoSQL solutions. And remember, NoSQL solutions are very different from each other. And this slide perhaps simplifies things too much. But it is still a very good slide to finish this presentation. Thank you!