This document provides an overview of different database types including relational, NoSQL, document, key-value, graph, and column family databases. It discusses the history and drivers behind the development of NoSQL databases, as well as concepts like horizontal scaling, the CAP theorem, and eventual consistency. Specific databases are also summarized, including MongoDB, Redis, Neo4j, and HBase.
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
NoSQL Databases
1.
2. Agenda
● History
● Relational databases
● Horizontal vs vertical scaling
● CAP theorem
● Document databases
● Key value databases
● Graph databases
● Column family databases
3. History
● Non SQL (not traditional tabular database)
● Facebook, Google, Amazon..etc (Big data and real
time applications)
● Horizontal scaling is a problem in relational
database
● Not only SQL (SQL like queries)
4. Relational Databases :)
● MySQL, Oracle, SQL Server, Postgres..etc
● Carpenter Hammer
● Easy & Popular
● Avoid data duplication but complex queries
● Atomicity (transactions)
5. Relational Databases :(
● Defined schema, optional attributes (NULLs)
● Use joins to aggregate related data
● Large data VOLUME and high rate of READ
(scalability)
17. SQL Vs NoSQL
Relational Databases NoSQL Databases
Vertical and not too many horizontal Horizontal scaling
Consistent Consistent or Eventual consistent
Scalable reads Scalable reads/writes
Transactions on multiple tables Difficult to support transactions
No partition tolerance Partition tolerance
Schema/tables Schemaless
Flexible queries (joins) Limited queries
18. 1) Document Databases
● Simple & popular
● Close to relational database
● MongoDB was a rising star in 2009
22. MongoDB Conclusion
● Simple
● Scalable
● Embedded document
● CP
● No joins
● May need to duplicate data
● Writes should go through master node
● Built-in Geo-spatial support
30. Redis in Memory
● No instant persistency by default in memory
● Persist periodically by taking snapshots
31. Redis CP
● Sharding (A,B,C)
● Replication A => A1, B => B1, C => C1
● If master B fails, B1 is the promoted to be a master
● Redis is NOT strong consistent (if both A, A1 fails)
● Riak is AP
32. Redis Conclusion
● Light & Compact
● Key-value
● Complex data types
● Fast in memory
● Dataset should be less than RAM size
● Transforming data, caching, messaging
● CP but not strongly consistent
● Flexible persistence levels
● Rarely used alone
33. 3) Graph Databases
● Directed graph
● Node has properties
● Relation has properties
36. Graph Databases (AP)
● Tens of billions of nodes and edges
● No Sharding; replicate all the graph
● High availability over Consistency
● Elect a gold master but writes to
slaves directly
● Community edition is free but full
version is NOT
38. Column-Family Databases
● In RDBMS,
heavy writes,
so store rows
as a bulk
● In columns,
heavy reads,
store columns
together
39. HBase
● Database for HDFS (RDBMS vs files)
● Widely used with Hadoop
● Scalability! At least five nodes in
production
● Facebook messaging system
infrastructure 2010
41. HBase Column Family
● Key-Value pairs
(Map of maps)
● Column families
should be defined
but the columns are
schema-less
42. HBase Versioning
● Versioning
● It became map of map
of map (asc, asc, desc)
● Garbage collector for
expired data
● Everything is binary
● Compression rate
43. FB Messaging Index Table
● The row keys are user IDs
● Column qualifiers are words that appear in
that user’s messages
● Timestamps are message IDs of messages
that contain that word
● Value is offset of word in message
44. HBase Vs Cassandra
● HBase on Hadoop, Cassandra is standalone
● HBase community is more active
● HBase is CP, Cassandra is AP
● Cassandra more suitable for high concurrent writes