In the recent years, NoSQL databases have been gaining a lot of traction. Most of them haven been designed and written from scratch. Building on the principles of schema-less and high scalability, they offer a distinct approach to that of relational databases. But rather than re-using what the industry has learned in the last 3 decades of database development, most of these databases are re-inventing the wheel and designing the data storage layers -one of the toughest part when building a database- from scratch. ToroDB is a database that uses instead relational databases as well-known, durable, scalable and fast -despite what many would saystorage layers as a foundation to build a schema-less, document-oriented, scalable database. This project has been recently published as open-source software. It will effectively be the very first general-purpose database ever built in Spain.
Document databases store documents, which are basically hierarchical, nested data structures of sets of key-value pairs. Current approaches to store them in relational databases are limited to storing documents in some form of binary serialization. We found is a set of algorithms to transform a document into a set of document-parts that can individually be stored in relational tables. This includes dynamic creation of tables, when needed, to match a table's structure to that of the information to be stored.
This means there is no engineering effort required in building the storage subsystem, which should handle durability, isolation and concurrency –all of which are tough properties to implement. But even more importantly, there are very significant performance advantages, both in query time and storage savings.
Query time improves as queries targeting subsets of the documents (which are most of the queries) need only to address a subset of the data -as it is partitioned into tables- rather than reading the whole database. Storage savings are achieved by avoiding repetition of the schema of every document –many documents share the same schema (“structure”) but all them need to repeat that. Our benchmarks shows that JSON documents require in ToroDB 29% to 68% of the storage required for the same data on a MongoDB database. These means significant less I/O, significant less cost, and greater (vertical) scalability.
This presentation shows how ToroDB works, how the JSON documents are split into tables. Why current document-oriented databases fail to maximize the performance of BigData requirements –ToroDB also includes a mechanism for storing in columnar format parts of the documents to improve aggregate-type queries, obtaining impressive performance benefits. And, finally, how this all can be done in a compatible way with existing systems: ToroDB includes a layer that natively speaks the MongoDB protocol, hence becoming a drop-in replacement for MongoDB installations, but running on top of existing relational databases.