What database

What database?
- a practical guide to selection from
NoSQL, SQL and Polyglot data stores
Regunath B
twitter.com/RegunathB
github.com/regunathb
Engineering @ CureFit-HealthFace, ex-Flipkart Infra services, Built Aadhaar

State of the Database Landscape - the options

What to look for?
Over-the-hood considerations

Data Manipulation
SQL[1]
KV operations put(k,v)
get(k)
remove(k)
Graph query
g.V().has(‘name','hercules').out('father').out('father').values('name')
Query
Language
Bulk Processing
• Loading
• Data Export &
Transfer

ACID properties
1.0
Atomicity
Consistency
Isolation
Durability
2.0
(For Scaling)
Associative
Commutative
Idempotent
Distributed
Transaction Support
Data Staleness
Ordering
Surviving Crashes
Transaction Support (Limited)
Relaxed Ordering (High- Throughput) - CRDTs [2]
Atleast-Once delivery (Eventually-Consistent)
High-Availability
Property
Impact

Wire-protocol, Standard
interfaces
• Wire Protocol
• Custom protocols over TCP/IP, Http, gRPC
• Support popular database protocols
• Postgres - e.g. CockroachDB
• Memcached - e.g. Couch base
• Standard Interfaces
• JDBC - e.g. Apache Hive, Phoenix for HBase, Vitess

Schema(less) support
Schema is
not Evil
• Why Schema-less
• Sparse-Metric and Entity-Attribute-Value storage needs
• Frequent changes
• Why Schema
• Understanding structure of data
• Referential Data integrity, Quality of data controlled by Data
dictionary
• In-between
• Schema-less but require Column Indexes (e.g. ColumnFamily
model of KV stores)

CAP theorem critique
• “one example of a fundamental trade-off between safety and
liveness in fault-prone systems” [3]
• Too simplistic [4]
• Choice of CA impractical mostly (Single node database), critique
therefore applies to CP or AP.
• CAP-Availability and CAP-Consistency is a spectrum and not binary
• e.g. AP-Reads, AP-Writes, Strong Consistency vs. Eventual
Consistency
• Deﬁne application tradeoffs, validate impact on NFRs - Latency,
Throughput
• Good starting point for considering Polyglot persistence
X

Polyglot Persistence - Tiered Data stores
Source: Aadhaar technology white-paper

Polyglot Persistence - CQRS
Source: Microsoft MSDN
Source: Flipkart catalog Write & Read[5]

Polyglot Persistence - pluggable storage,
secondary indices
• Healthcare Graph data
(Conditions, Symptoms) on
Apache Titan
• Mostly Read-only queries - Point
lookups, one-hop traversals
• AP-Read data (Storage engine :
Cassandra)
• Also query by properties of
Vertex/Edge(Secondary indices
in ES)
Source: CureFit Symptoms & Conditions datastore

Others
• Performance benchmarks - Latency, Throughput,
Concurrency - e.g. Graph DBs benchmark [6]
• Operations & Maintenance - e.g. MySQL as
backend data store for Facebook TAO [7], LinkedIn
Espresso [8]
• Support - Paid (single vendor vs. multiple),
Community (size, composition)
• Hosted service - on public clouds as a managed
service

What to look for?
Under-the-hood considerations

Database Type
• Relational
• All ﬁeld values of a row stored together
• Common storage formats: BTree
• Better suited for OLTP
• Columnar
• All values of a column stored together
• More efﬁcient data compression
• OLAP queries perform better
Source:
https://gerardnico.com/wiki/relation/structure/column_store

Database Type
• Document
• Sub-class of a KV store
• Often hierarchical (DB -> Collection ->
Document)
• Often have challenges in optimising
storage - due to lack of Data
Dictionary (schema-free)
• KV
• Often RAM based
• Durability through replication(sync)
and persistence to disk
• Preference for LSM over in-place
updates when designed for SSD
Source:
https://blog.mlab.com/2014/01/how-big-is-your-mongodb/
Source:
http://www.aerospike.com/technologies/

Data Organisation
• B-Tree
• Better suited for in-place updates
• Log Structured Merged (LSM)
• Better suited for high insert volume
• Better suited for SSD (for reducing write
ampliﬁcation)
• Achieve high data locality of reference
through good row-key design [9]
Source: http://www.programering.com/a/MTMwAzMwATM.html
Source: http://www.cyanny.com/2014/03/13/hbase-architecture-
analysis-part1-logical-architecture/

Replication, Consensus
• Replication
• Sync vs. Async
• No. of Replicas, Min. Replicas, Journalling, Guaranteed writes
with hinted handoff
• Single master read-write(CP) vs. Replica reads(AP)
• Consensus
• Used in
• Leader election
• Committing transactions/Log replication
• Strength of protocol - Paxos, Raft, Zab etc.
• Jepsen Tests (https://jepsen.io/) - Tests ‘Safety’ of distributed
databases
• e.g. CockroachDB, MongoDB, VoltDB, Solr, Elastic Search etc
Source:https://martin.kleppmann.com
Source: https://raft.github.io/raft.pdf

Operations
• Data export & restore (RPO, RTO) - Disaster Recovery(DR)
• Tools for full export vs incremental snapshots
• Tools for restoring from exports, logs
• Piggy-back on XDC replication support to create continuous/ongoing
backup&restore
• Large scale data migration [10]
• Mean Time to Recovery (MTTR) - Node failure/Minor outages
• e.g. promoting hot-standby to master
• Tools to detect failure, validate data, promote new master/leader

Cost
• Disk-Memory ratio
• Database architecture to support disk storage, size of on-disk data w.r.t RAM
• Compute required
• No. of compute nodes required to keep data on-line
• Power Consumption
• SSD based databases generally more energy efﬁcient than HDD
• Density of storage
• Relevant when storing large data over extended periods of time
• e.g. Aadhaar enrolment raw data, Facebook photos [11]

DB-speciﬁc Optimisations to leverage RAM, reduce Disk I/O
• Data block-cache/buffer-pool
• Reduces disk I/O
• Provides lower latency on repeat reads
• Provides potentially lower latency for
reads on high data locality of reference
• Bloom Filters
• Reduces disk I/O and row scanning in
random key lookups
Source: https://sematext.com
Source: Cloudera

References
• [1] - Google Spanner becoming a SQL System
• [2] - CRDTs in Riak
• [3] - Perspectives on the CAP Theorem
• [4] - Martin Kleppmann CP or AP
• [5] - Flipkart Catalog System, Datastore
• [6] - Do We Need Specialised Graph Databases?
• [7] - Facebook TAO social graph data store
• [8] - LinkedIn Espresso
• [9] - Facebook style notiﬁcations using HBase
• [10] - Flipkart DC migration
• [11] - Facebook cold storage system

What database

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (14)

Similar a What database

Similar a What database (20)

Último

Último (20)

What database