SlideShare a Scribd company logo
1 of 43
Computational Research Division Lawrence Berkeley National Laboratory Dan Gunter
Introduction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Terminology: NOSQL and “Schemaless” ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
NOSQL past and present Pre-RDBMS RDBMS era NOSQL
Pre-relational structured storage systems ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Computer Systems News , 11/28/83
The relational model ,[object Object],[object Object],[object Object],A 1 ... A n Value 1 ... Value n R Relation (Table) Relation variable (Table name) Attribute (Column) {unordered} Heading Tuple (Row) {unordered}
Recent NOSQL database products Columnar  or  Extensible record  Google BigTable HBase Cassandra HyperTable SimpleDB Document Store CouchDB MongoDB Lotus Domino Graph DB Neo4j FlockDB InfiniteGraph Key/Value Store Mnesia Memcached Redis Tokyo Cabinet Dynamo Project Voldemort Dynomite Riak
Why NOSQL? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
CAP Theorem ,[object Object],[object Object],[object Object],[object Object],[object Object],All robust distributed systems live here Forfeit partition-tolerance Forfeit availability Forfeit consistency Single-site databases, cluster databases, LDAP  Distributed databases w/pessimistic locking, majority protocols Coda, web caching, DNS,  Dynamo
CAP, ACID, and BASE ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ACID BASE
Pioneers ,[object Object],[object Object],These implementations are  not  publicly available, but the distributed-system techniques that they integrated to build huge databases have been imitated, to a greater or lesser extent, by every implementation that followed.
Google BigTable ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
BigTable’s Data Model Google’s Bigtable is essentially a massive, distributed 3-D spreadsheet. It doesn’t do SQL, there is limited support for atomic transactions, nor does it support the full relational database model. In short, in these and other areas, the Google team made design trade-offs to enable the scalability and fault-tolerance Google apps require. - Robin Harris, StorageMojo (blog), 2006-09-08 t 6 t 5 t 3 name contents: anchor:cnnsi.com ... anchor:my.look.ca ... “ com.cnn.www” “ CNN” ... “ CNN.com” ... “ <html>...” “ <html>...” “ <html>...”
Tablets and SSTables ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Use of Bloom Filters to optimize lookups ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],w  is not in {  x, y, z  } because it hashes to one position with a 0 1 1 1 0 0 1 0 1 0 1 0 0 1 0 { x,  w y, z }
Chubby and Paxos ,[object Object],Each “DB” is a replica Each server runs on its own host Google tends to run 5 servers, with only one being the “master” at any one time Chubby server  DB Chubby server  DB Chubby server  DB Chubby server  DB Chubby server  DB Master
What about CAP? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Amazon’s Dynamo ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dynamo data partitioning and replication Virtual node Host “node” Host “node” Virtual node Virtual node Virtual node Virtual node Virtual node Virtual node . . Hash ring using consistent hashing Host “node” Virtual node Virtual node Virtual node Virtual node 4 4 3 Item Hashes to this spot coordinator node replicas
Eventual consistency and sloppy quorum ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Replica synchronization with Merkle trees ,[object Object],[object Object],[object Object],For Dynamo, the “data” are the keys stored in a given virtual node Each node is a hash of its children If two top hashes match, then the trees are the same
Infrastructure (at scale) is fractal ,[object Object],[object Object],[object Object]
The Gold Rush Columnar  or  Extensible record  Google BigTable HBase Cassandra HyperTable SimpleDB Document Store CouchDB MongoDB Lotus Domino Graph DB Neo4j FlockDB InfiniteGraph Key/Value Store Mnesia Memcached Redis Tokyo Cabinet Dynamo Project Voldemort Dynomite Riak Hibari
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Key/Value Store Memcached Redis Tokyo Cabinet Dynamo Project Voldemort Dynomite Riak Hibari
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Project Voldemort Type Key/Value Store License Apache 2.0 Language Java Company Linked-In Web project-voldemort.com
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],result = self.client.add(bucket.get_name()).map(&quot;Riak.mapValuesJson” .reduce(&quot;Riak.reduceSum”.run() Riak Example: Map/reduce with the Python API Type Key/Value Store License Open-Source Language Erlang Company Basho Web wiki.basho.com/display/RIAK/Riak/
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Hibari Type Key/Value Store  License Open-Source Language Erlang Company Gemini Mobile Web sourceforge.net/projects/hibari/
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Columnar  or  Extensible record  Google BigTable HBase Cassandra HyperTable
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Cassandra ,[object Object],Type Extensible column store License Apache 2.0 Language Java Company Apache Software Foundation Web cassandra.apache.org
[object Object],[object Object],[object Object],[object Object],[object Object],SimpleDB Document Store CouchDB MongoDB Lotus Domino Mnesia
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],CouchDB Type Document store License Apache 2.0 Language Erlang Company Apache Software Foundation Web couchdb.org
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],MongoDB ,[object Object],http://www.slideshare.net/mongodb/mongodb-replica-sets Type Document store License GPL Language C++ Company 10gen Web mongodb.org
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Mnesia * Mozilla Public License modified to conform with laws of Sweden (more herring) Type Document store License EPL* Language Erlang Company Ericsson Web www.erlang.org Papers http://www.erlang.se/publications/mnesia_overview.pdf
Why do we care about Mnesia / OTP? ,[object Object],[object Object],females() -> F = fun() -> Q = query [E.name || E <- table(employee),     E.sex = female] end, mnemosyne:eval(Q) end, mnesia:transaction(F).  Erlang query for “all females” in company* *I know, but it’s not  my  example. This is right out of the manual.
Comparison of MongoDB and CouchDB ,[object Object],[object Object],[object Object],[object Object],[object Object],Database Inserts/sec MongoDB 16,000 CouchDB 70 CouchDB, batch 1,800
Schemaless data modeling http://labs.mudynamics.com/2010/04/01/why-nosql-is-bad-for-startups/
Example from distributed monitoring ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],All of these are data modeling “anti-patterns” for relational DBs
What’s wrong with EAV? ,[object Object],[object Object]
What about queries?
SQL vs. M/R and other models ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Conclusions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Selected references ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Simplilearn
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data StreamsSujaAldrin
 
6 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2Fabio Fumarola
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and UsesSuvradeep Rudra
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Simplilearn
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATAGauravBiswas9
 
Data cube computation
Data cube computationData cube computation
Data cube computationRashmi Sheikh
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data scienceAjay Ohri
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem pptsunera pathan
 
Advanced Database Lecture Notes
Advanced Database Lecture NotesAdvanced Database Lecture Notes
Advanced Database Lecture NotesJasour Obeidat
 

What's hot (20)

Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data Streams
 
6 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
Data mining primitives
Data mining primitivesData mining primitives
Data mining primitives
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATA
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Data cube computation
Data cube computationData cube computation
Data cube computation
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Parallel Database
Parallel DatabaseParallel Database
Parallel Database
 
Advanced Database Lecture Notes
Advanced Database Lecture NotesAdvanced Database Lecture Notes
Advanced Database Lecture Notes
 
1. Apache HIVE
1. Apache HIVE1. Apache HIVE
1. Apache HIVE
 
Oltp vs olap
Oltp vs olapOltp vs olap
Oltp vs olap
 

Similar to Schemaless Databases

05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.pptAnandKonj1
 
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'sankarapu posibabu
 
No SQL Databases.ppt
No SQL Databases.pptNo SQL Databases.ppt
No SQL Databases.pptssuser8c8fc1
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduceJ Singh
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupesh Bansal
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, HowIgor Moochnick
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesJon Meredith
 
Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Max Neunhöffer
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsFirat Atagun
 
No sql distilled-distilled
No sql distilled-distilledNo sql distilled-distilled
No sql distilled-distilledrICh morrow
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceJ Singh
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational DatabasesUdi Bauman
 

Similar to Schemaless Databases (20)

05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt
 
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
 
No SQL Databases.ppt
No SQL Databases.pptNo SQL Databases.ppt
No SQL Databases.ppt
 
NoSQL Basics - A Quick Tour
NoSQL Basics - A Quick TourNoSQL Basics - A Quick Tour
NoSQL Basics - A Quick Tour
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
 
No sql
No sqlNo sql
No sql
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
NoSql Database
NoSql DatabaseNoSql Database
NoSql Database
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?
 
Datastores
DatastoresDatastores
Datastores
 
No sql
No sqlNo sql
No sql
 
Oslo baksia2014
Oslo baksia2014Oslo baksia2014
Oslo baksia2014
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
 
No sql distilled-distilled
No sql distilled-distilledNo sql distilled-distilled
No sql distilled-distilled
 
NOSQL
NOSQLNOSQL
NOSQL
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
 

Recently uploaded

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Recently uploaded (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Schemaless Databases

  • 1. Computational Research Division Lawrence Berkeley National Laboratory Dan Gunter
  • 2.
  • 3.
  • 4. NOSQL past and present Pre-RDBMS RDBMS era NOSQL
  • 5.
  • 6.
  • 7. Recent NOSQL database products Columnar or Extensible record Google BigTable HBase Cassandra HyperTable SimpleDB Document Store CouchDB MongoDB Lotus Domino Graph DB Neo4j FlockDB InfiniteGraph Key/Value Store Mnesia Memcached Redis Tokyo Cabinet Dynamo Project Voldemort Dynomite Riak
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13. BigTable’s Data Model Google’s Bigtable is essentially a massive, distributed 3-D spreadsheet. It doesn’t do SQL, there is limited support for atomic transactions, nor does it support the full relational database model. In short, in these and other areas, the Google team made design trade-offs to enable the scalability and fault-tolerance Google apps require. - Robin Harris, StorageMojo (blog), 2006-09-08 t 6 t 5 t 3 name contents: anchor:cnnsi.com ... anchor:my.look.ca ... “ com.cnn.www” “ CNN” ... “ CNN.com” ... “ <html>...” “ <html>...” “ <html>...”
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19. Dynamo data partitioning and replication Virtual node Host “node” Host “node” Virtual node Virtual node Virtual node Virtual node Virtual node Virtual node . . Hash ring using consistent hashing Host “node” Virtual node Virtual node Virtual node Virtual node 4 4 3 Item Hashes to this spot coordinator node replicas
  • 20.
  • 21.
  • 22.
  • 23. The Gold Rush Columnar or Extensible record Google BigTable HBase Cassandra HyperTable SimpleDB Document Store CouchDB MongoDB Lotus Domino Graph DB Neo4j FlockDB InfiniteGraph Key/Value Store Mnesia Memcached Redis Tokyo Cabinet Dynamo Project Voldemort Dynomite Riak Hibari
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36. Schemaless data modeling http://labs.mudynamics.com/2010/04/01/why-nosql-is-bad-for-startups/
  • 37.
  • 38.
  • 40.
  • 41.
  • 42.
  • 43.

Editor's Notes

  1. This talk really comes out of my attempt to orient myself in this space. Background is in monitoring distributed systems, concerned with scalable collection and data analysis. But also want to know what I can use for semi-structured data “in the small”.
  2. Where it applies, the distinction between relatively fixed schemas and dynamic ones is more technically significant than what query syntax is used to access the data, as has been shown by a number of products that provided a dialect of SQL as an alternative query language either alongside or on top of their native syntax.
  3. PICK -- MultiValue (aka PICK) databases are developed at TRW in 1965. M[umps] -- According to comment from Scott Jones M[umps] is developed at Mass General Hospital in 1966. It is a programming language that incorporates a hierarchical database with B+ tree storage. IBM IMS -- IBM IMS, a hierarchical database, is developed with Rockwell and Caterpillar for the Apollo space program in 1966. ISM -- InterSystems develops the ISM product family succeeded by the Open M product, all M[umps] implementations. See comment from Scott Jones below. ANSI M -- M[umps] is approved as a ANSI standard language in 1977. AT&amp;T DBM -- in 1979 Ken Thompson creates DBM which is released by AT&amp;T. At it&apos;s core it is a file-based hash. TDBM -- TDBM supporting atomic transactions NDBM -- NDBM was the Berkeley version of DBM supporting having multiple databases open at the same time. SDBM -- SDBM - another clone of DBM mainly for licensing reasons. GT.M -- GT.M is the first version of a key-value store with focus on high performance transaction processing. It is open sourced in 2000. BerkeleyDB -- BerkeleyDB is created at Berkeley in the transition from 4.3BSD to 4.4BSD. Sleepycat software is started as a company in 1996 when Netscape needed new features for BerkeleyDB. Later acquired by Oracle which still sell and maintain BerkeleyDB. Lotus Domino -- Lotus Notes or rather the server part, Lotus Domino, which really is a document database has it&apos;s initial release in 1989, now sold by IBM. It has evolved a lot from the early versions and is now a full office and collaboration suite. GDBM -- GDBM is the Gnu project clone of DBM Mnesia -- Mnesia is developed by Ericsson as a soft real-time database to be used in telecom. It is relational in nature but does not use SQL as query language but rather Erlang itself. Cache -- InterSystems CachÈ launched in 1997 and is a hybrid so-called post-relational database. It has object interfaces, SQL, PICK/MultiValue and direct manipulation of data structures. It is a M[umps] implementation. See Scott Jones comment below for more on the history of InterSystems Metakit -- Metakit is started in 1997 and is probably the first document oriented database. Supports smaller datasets than the ones in vogue nowadays. Neo4j -- Graph database Neo4j is started in 2000. db4o -- db4o an object database for java and .net is started in 2000 QDBM -- QDBM is a re-implementation of DBM with better performance by Mikio Hirabayashi. Memcached -- Memcached is started in 2003 by Danga to power Livejournal. Memcached isn&apos;t really a database since it&apos;s memory-only but there is soon a version with file storage called memcachedb. Infogrid graph DB -- Infogrid graph database is started as closed source in 2005, open sourced in 2008 CouchDB -- CouchDB is started in 2005 and provides a document database inspired by Lotus Notes. The project moves to the Apache Foundation in 2008. Google BigTable -- Google BigTable is started in 2004 and the research paper is released in 2006. JackRabbit -- JackRabbit is started in 2006 as an implementation of JSR 170 and 283. Tokyo Cabinet -- Tokyo Cabinet is a successor to QDBM by (Mikio Hirabayashi) started in 2006 Dynamo -- The research paper on Amazon Dynamo is released in 2007. MongoDB -- The document database MongoDB is started in 2007 as a part of a open source cloud computing stack and first standalone release in 2009. Cassandra -- Facebooks open sources the Cassandra project in 2008 Voldemort -- Project Voldemort is a replicated database with no single point-of-failure. Started in 2008. Dynomite -- Dynomite is a Dynamo clone written in Erlang. Terrastore -- Terrastore is a scalable elastic document store started in 2009 Redis -- Redis is persistent key-value store started in 2009 Riak -- Riak Another dynamo-inspired database started in 2009. HBase -- HBase is a BigTable clone for the Hadoop project while Hypertable is another BigTable type database also from 2009. Vertexdb -- Vertexdb another graph database is started in 2009 Term: NOSQL -- Eric Evans of Rackspace, a committer on the Cassandra project, introduces the term NoSQL often used in the sense of Not only SQL to describe the surge of new projects and products.
  4. Both of these systems are still used. An open-source version of M, called GT.M, is available (since 2000). M is still used by the US Dept of Veterans Affairs, and also by Ameritrade (Cache’: 12B transactions a day), ING Direct, and others in the financial industry. The IBM IMS system is still very actively used today, in particular for the US Federal Reserve. According to Wikipedia, odds are good your ATM transaction hits an IMS database. Chinese banks have purchased IMS technology. IMS includes a separate “transaction management” (TM) system.
  5. E. F. Codd’s seminal 1970 paper, “ A Relational Model of Data for Large Shared Data Banks” laid out a solid mathematical basis for databases in contrast to the hierarchical and network models of the time, relational algebra, an offshoot of first-order logic, provided a declarative means of reasoning about the data that did not depend on the implementation SQL is “loosely based” on relational algebra
  6. This taxonomy will be explored in more detail later, the point for now is that there are several different types of datastores and a number of examples of each and, referring back to the timeline, most of these implementations have occurred in the past few years..
  7. Corporations (once again) found themselves at the forefront of systems research. But what was that research? (Read on..)
  8. If nothing else, being able to refer to the “CAP theorem” the next time your networked demo breaks..
  9. In his talk, Brewer said “there is almost no work in this area”. I think that the existence of scalable (schemaless) database systems is proof that this has changed.
  10. Pictured is Parliament, pioneers of funk!
  11. Trivia: what major movie was about producing a script called “Chubby Rain”?
  12. Example of a BIgTable that stores web pages (directly out of the paper). The row names are reversed URLs (so sorted rows tend to group things by the same domain) There are two column families, “contents” and “anchor” In this example, each anchor cell has one version, and the contents column has 3
  13. Paxos is an old and well-known algorithm. The Chubby “Database” is really a set of directories with small “lockfiles”. Each tablet server gets one Chubby directory, and each of its tablets is a lockfile.
  14. These core services included the Amazon e-commerce shopping cart.
  15. Each virtual node is responsible for keys between itself and its predecessor on the ring. The mapping of a single node to a variable number of virtual nodes on the hash ring accounts for heterogeneity (host “power”) in the system.
  16. The quorum is “sloppy” because R and W refer to the number of healthy nodes, which may change between the write and subsequent read of the key.
  17. (Who knows what this is?) The picture is a close-up of a vegetable: the “ Chou Romanesco&amp;quot; cauliflower
  18. Particularly appropriate analogy because of the industry’s tendency to rush towards shiny new technologies! Following sections will examine each of these categories and walk through one publicly available product (or more) for each. With the exception of graph databases, which I simply haven’t taken the time to grok yet.
  19. Both Voldemort and the next database, Riak, claim they were “inspired” by the early Dynamo paper
  20. In the diagram, the green nodes are head; orange middle; red are tails. The white arrows are write requests, grey read requests, and red are (all) replies.
  21. Developed by former engineers from BigTable and Dynamo projects, in heavy use at Facebook. For consistency level, zero = totally async.; Any= 1 node, including hinted handoff; Quorum = R/2+1 where R = #replicas Reads of 0 or Any don’t make sense. 0=no data, Any=wrong node; can’t do read-repairs, just the handed-off version
  22. Has a nice Web UI called “Futon”. Yes, everything is a reclining furniture pun.
  23. Obviously, this is at best a micro-benchmark. YCSB stands for Yahoo! Cloud Serving Benchmark
  24. I won’t attempt to actually cover Map/Reduce, and don’t know Erlang. Instead: what impact do these databases have on data modeling efforts?