ToroDB 
A BRIDGE BETWEEN 
THE NOSQL AND RELATIONAL WORLDS 
Álvaro Hernández <aht@8kdata.com>
About *8Kdata* 
www.8kdata.com 
● Research & Development in databases 
●Consulting, Training and Support in PostgreSQL 
● ...
How big is “NoSQL”? 
Source: 451 Research
Why people want “NoSQL”? 
● Schema-less 
● High availability 
● It's cool
The schema-less fallacy 
{ 
“name”: “Álvaro”, 
“surname”: “Hernández”, 
“height”: 200, 
“hobbies”: [ 
“PostgreSQL”, “triat...
The schema-less fallacy 
{ 
“name”: “Álvaro”, 
“surname”: “Hernández”, 
“height”: 200, 
“hobbies”: [ 
“PostgreSQL”, “triat...
The schema-less fallacy: BSON 
{ 
“name”: (string) “Álvaro”, 
“surname”: (string) “Hernández”, 
“height”: (number) 200, 
“...
The schema-less fallacy 
● It's not schema-less 
● It is “attached-schema” 
● It carries an overhead which is not 0
High availability: at what cost? 
Jepsen!!! :) 
MongoDB: 
➔ Unacknowledged: 42% data loss 
➔ Safe: 37% data loss 
➔ Only m...
More NoSQL struggle 
● Durability is sometimes not guaranteed 
on a single node 
● Programming for AP systems may be a 
bi...
Can we do a better “NoSQL”? 
● Document model is very appealing to 
many. Let's offer it 
● DRY: why not use relational 
d...
Schema-attached repetition 
{ “a”: 1, “b”: 2 } 
{ “a”: 3 } 
{ “a”: 4, “c”: 5 } 
{ “a”: 6, “b”: 7 } 
{ “b”: 8 } 
{ “a”: 9, ...
Schema-attached repetition 
How data is stored in schema-less
Pettus and BTP inspired us 
https://wiki.postgresql.org/images/b/b4/Pg-as-nosql-pgday-fosdem-2013.pdf 
http://www.slidesha...
ToroDB 
ToroDB – Teaser https://flic.kr/p/9HzWhT
What is ToroDB 
● Open source, document-oriented, 
JSON database that runs on top of 
PostgreSQL 
● JSON documents are sto...
ToroDB benefits 
● 100% durable database 
● High concurrency and performance 
● Compatible with existing mongo API 
progra...
ToroDB storage 
● Data is stored in tables 
● JSON documents are split by hierarchy 
levels, and each (plain) level goes t...
ToroDB storage (II) 
● A “structure” table keeps the 
subdocument “schema” 
● Keys in JSON are mapped to attributes, 
whic...
ToroDB storage (III) 
How data is stored in ToroDB
ToroDB storage internals 
{ 
"name": "ToroDB", 
"data": { 
"a": 42, "b": "hello world!" 
}, 
"nested": { 
"j": 42, 
"deepe...
ToroDB storage internals 
The document is split into the following subdocuments: 
{ "name": "ToroDB", "data": {}, "nested"...
ToroDB storage internals 
select * from demo.t_3 
┌─────┬───────┬────────────────────────────┬────────┐ 
│ did │ index │ _...
ToroDB storage internals 
select * from demo.structures 
┌─────┬──────────────────────────────────────────────────────────...
ToroDB storage and I/O savings 
29% - 68% storage required, 
compared to Mongo 2.6
ToroDB performance
ToroDB performance (II)
ToroDB: query “by structure” 
● ToroDB is effectively partitioning by 
type 
● Structures (schemas, partitioning types) 
a...
ToroDB: Developer Preview 
● ToroDB launched on October 2014, as 
a Developer Preview. Support for CRUD 
and most of the S...
ToroDB: Developer Preview 
● Clone the repo, build with Maven 
● Or download the JAR: 
http://maven.torodb.com/release/com...
ToroDB: Community Response
ToroDB: Community Response
ToroDB: Roadmap 
● Current Developer Preview is 
single-node 
● Version 1.0: 
➔ Expected Q1 2015 
➔ Production-ready 
➔ Mo...
Big Data speaking mongo: 
Vertical ToroDB 
What if we use CitusData's cstore to store 
the JSON documents?
Big Data speaking mongo: 
Vertical ToroDB 
1.17% - 20.26% storage required, 
compared to Mongo 2.6
“Software acknowledgements” 
● PostgreSQL! 
● The Netty framework 
● jOOQ 
● Guava, guice, findbugs 
● Hikari CP
ToroDB: a bridge between the NoSQL and Relational worlds
Próxima SlideShare
Cargando en…5
×

ToroDB: a bridge between the NoSQL and Relational worlds

17.138 visualizaciones

Publicado el

In the recent years, NoSQL databases have been gaining a lot of traction. Most of them haven been designed and written from scratch. Building on the principles of schema-less and high scalability, they offer a distinct approach to that of relational databases. But rather than re-using what the industry has learned in the last 3 decades of database development, most of these databases are re-inventing the wheel and designing the data storage layers -one of the toughest part when building a database- from scratch. ToroDB is a database that uses instead relational databases as well-known, durable, scalable and fast -despite what many would saystorage layers as a foundation to build a schema-less, document-oriented, scalable database. This project has been recently published as open-source software. It will effectively be the very first general-purpose database ever built in Spain.

Document databases store documents, which are basically hierarchical, nested data structures of sets of key-value pairs. Current approaches to store them in relational databases are limited to storing documents in some form of binary serialization. We found is a set of algorithms to transform a document into a set of document-parts that can individually be stored in relational tables. This includes dynamic creation of tables, when needed, to match a table's structure to that of the information to be stored.

This means there is no engineering effort required in building the storage subsystem, which should handle durability, isolation and concurrency –all of which are tough properties to implement. But even more importantly, there are very significant performance advantages, both in query time and storage savings.

Query time improves as queries targeting subsets of the documents (which are most of the queries) need only to address a subset of the data -as it is partitioned into tables- rather than reading the whole database. Storage savings are achieved by avoiding repetition of the schema of every document –many documents share the same schema (“structure”) but all them need to repeat that. Our benchmarks shows that JSON documents require in ToroDB 29% to 68% of the storage required for the same data on a MongoDB database. These means significant less I/O, significant less cost, and greater (vertical) scalability.

This presentation shows how ToroDB works, how the JSON documents are split into tables. Why current document-oriented databases fail to maximize the performance of BigData requirements –ToroDB also includes a mechanism for storing in columnar format parts of the documents to improve aggregate-type queries, obtaining impressive performance benefits. And, finally, how this all can be done in a compatible way with existing systems: ToroDB includes a layer that natively speaks the MongoDB protocol, hence becoming a drop-in replacement for MongoDB installations, but running on top of existing relational databases.

Publicado en: Software
  • Sé el primero en comentar

ToroDB: a bridge between the NoSQL and Relational worlds

  1. 1. ToroDB A BRIDGE BETWEEN THE NOSQL AND RELATIONAL WORLDS Álvaro Hernández <aht@8kdata.com>
  2. 2. About *8Kdata* www.8kdata.com ● Research & Development in databases ●Consulting, Training and Support in PostgreSQL ● Founders of PostgreSQL España, 3rd largest PUG in the world (322 members as of today) ● About myself: CEO at 8Kdata: @ahachete http://linkd.in/1jhvzQ3
  3. 3. How big is “NoSQL”? Source: 451 Research
  4. 4. Why people want “NoSQL”? ● Schema-less ● High availability ● It's cool
  5. 5. The schema-less fallacy { “name”: “Álvaro”, “surname”: “Hernández”, “height”: 200, “hobbies”: [ “PostgreSQL”, “triathlon” ] }
  6. 6. The schema-less fallacy { “name”: “Álvaro”, “surname”: “Hernández”, “height”: 200, “hobbies”: [ “PostgreSQL”, “triathlon” ] } metadata → Isn't that... schema?
  7. 7. The schema-less fallacy: BSON { “name”: (string) “Álvaro”, “surname”: (string) “Hernández”, “height”: (number) 200, “hobbies”: { “0”: (string) “PostgreSQL” , “1”: (string) “triathlon” metadata → Isn't that... schema? } }
  8. 8. The schema-less fallacy ● It's not schema-less ● It is “attached-schema” ● It carries an overhead which is not 0
  9. 9. High availability: at what cost? Jepsen!!! :) MongoDB: ➔ Unacknowledged: 42% data loss ➔ Safe: 37% data loss ➔ Only majority is safe http://aphyr.com/posts/284-call-me-maybe-mongodb
  10. 10. More NoSQL struggle ● Durability is sometimes not guaranteed on a single node ● Programming for AP systems may be a big burden ● Most (all?) NoSQL databases wrote their storage from scratch. Journaling, concurrency are really hard
  11. 11. Can we do a better “NoSQL”? ● Document model is very appealing to many. Let's offer it ● DRY: why not use relational databases? They are proven, durable, concurrent and flexible ● Why not base it on relational databases, like PostgreSQL?
  12. 12. Schema-attached repetition { “a”: 1, “b”: 2 } { “a”: 3 } { “a”: 4, “c”: 5 } { “a”: 6, “b”: 7 } { “b”: 8 } { “a”: 9, “b”: 10 } { “a”: 11, “b”: 12, “j”: 13 } { “a”: 14, “c”: 15 } Counting “document types” in collections of millions: at most, 1000s of different types
  13. 13. Schema-attached repetition How data is stored in schema-less
  14. 14. Pettus and BTP inspired us https://wiki.postgresql.org/images/b/b4/Pg-as-nosql-pgday-fosdem-2013.pdf http://www.slideshare.net/nosys/billion-tables-project-nycpug-2013
  15. 15. ToroDB ToroDB – Teaser https://flic.kr/p/9HzWhT
  16. 16. What is ToroDB ● Open source, document-oriented, JSON database that runs on top of PostgreSQL ● JSON documents are stored relationally, not as a blob: significant storage and I/O savings ● Wire-protocol compatibility with Mongo
  17. 17. ToroDB benefits ● 100% durable database ● High concurrency and performance ● Compatible with existing mongo API programs, clients ● Full set of JSON operations (MongoDB's “SELECT” API)
  18. 18. ToroDB storage ● Data is stored in tables ● JSON documents are split by hierarchy levels, and each (plain) level goes to a different table ● Subdocuments are classified by “type”, which maps to tables
  19. 19. ToroDB storage (II) ● A “structure” table keeps the subdocument “schema” ● Keys in JSON are mapped to attributes, which retain the original name ● Tables are created dinamically and transparently to match the exact types of the documents
  20. 20. ToroDB storage (III) How data is stored in ToroDB
  21. 21. ToroDB storage internals { "name": "ToroDB", "data": { "a": 42, "b": "hello world!" }, "nested": { "j": 42, "deeper": { "a": 21, "b": "hello" } } }
  22. 22. ToroDB storage internals The document is split into the following subdocuments: { "name": "ToroDB", "data": {}, "nested": {} } { "a": 42, "b": "hello world!"} { "j": 42, "deeper": {}} { "a": 21, "b": "hello"}
  23. 23. ToroDB storage internals select * from demo.t_3 ┌─────┬───────┬────────────────────────────┬────────┐ │ did │ index │ _id │ name │ ├─────┼───────┼────────────────────────────┼────────┤ │ 0 │ ¤ │ x5451a07de7032d23a908576d │ ToroDB │ └─────┴───────┴────────────────────────────┴────────┘ select * from demo.t_1 ┌─────┬───────┬────┬──────────────┐ │ did │ index │ a │ b │ ├─────┼───────┼────┼──────────────┤ │ 0 │ ¤ │ 42 │ hello world! │ │ 0 │ 1 │ 21 │ hello │ └─────┴───────┴────┴──────────────┘ select * from demo.t_2 ┌─────┬───────┬────┐ │ did │ index │ j │ ├─────┼───────┼────┤ │ 0 │ ¤ │ 42 │ └─────┴───────┴────┘
  24. 24. ToroDB storage internals select * from demo.structures ┌─────┬────────────────────────────────────────────────────────────────────────────┐ │ sid │ _structure │ ├─────┼────────────────────────────────────────────────────────────────────────────┤ │ 0 │ {"t": 2, "data": {"t": 1}, "nested": {"t": 3, "deeper": {"i": 1, "t": 1}}} │ └─────┴────────────────────────────────────────────────────────────────────────────┘ select * from demo.root; ┌─────┬─────┐ │ did │ sid │ ├─────┼─────┤ │ 0 │ 0 │ └─────┴─────┘
  25. 25. ToroDB storage and I/O savings 29% - 68% storage required, compared to Mongo 2.6
  26. 26. ToroDB performance
  27. 27. ToroDB performance (II)
  28. 28. ToroDB: query “by structure” ● ToroDB is effectively partitioning by type ● Structures (schemas, partitioning types) are cached in ToroDB memory ● Queries only scan a subset of the data. ● Negative queries are served directly from memory.
  29. 29. ToroDB: Developer Preview ● ToroDB launched on October 2014, as a Developer Preview. Support for CRUD and most of the SELECT API ● github.com/torodb ● RERO policy. Comments, feedback, patches... greatly appreciated ● AGPLv3
  30. 30. ToroDB: Developer Preview ● Clone the repo, build with Maven ● Or download the JAR: http://maven.torodb.com/release/com/torodb/toro db/0.11/torodb-0.11-jar-with-dependencies.jar ●Usage: java -jar torodb-version.jar –help java -jar torodb/target/torodb-version.jar -d dbname -u dbuser -P 27017 Connect with normal mongo console!
  31. 31. ToroDB: Community Response
  32. 32. ToroDB: Community Response
  33. 33. ToroDB: Roadmap ● Current Developer Preview is single-node ● Version 1.0: ➔ Expected Q1 2015 ➔ Production-ready ➔ MongoDB Replication support (Paxos-based replication protocol?) ➔ Very high compatibility with Mongo API
  34. 34. Big Data speaking mongo: Vertical ToroDB What if we use CitusData's cstore to store the JSON documents?
  35. 35. Big Data speaking mongo: Vertical ToroDB 1.17% - 20.26% storage required, compared to Mongo 2.6
  36. 36. “Software acknowledgements” ● PostgreSQL! ● The Netty framework ● jOOQ ● Guava, guice, findbugs ● Hikari CP

×