SlideShare una empresa de Scribd logo
1 de 21
Descargar para leer sin conexión
What database?
- a practical guide to selection from
NoSQL, SQL and Polyglot data stores
Regunath B
twitter.com/RegunathB
github.com/regunathb
Engineering @ CureFit-HealthFace, ex-Flipkart Infra services, Built Aadhaar
State of the Database Landscape - the options
What to look for?
Over-the-hood considerations
Data Manipulation
SQL[1]
KV operations put(k,v)
get(k)
remove(k)
Graph query
g.V().has(‘name','hercules').out('father').out('father').values('name')
Query
Language
Bulk Processing
• Loading
• Data Export &
Transfer
ACID properties
1.0
Atomicity
Consistency
Isolation
Durability
2.0
(For Scaling)
Associative
Commutative
Idempotent
Distributed
Transaction Support
Data Staleness
Ordering
Surviving Crashes
Transaction Support (Limited)
Relaxed Ordering (High- Throughput) - CRDTs [2]
Atleast-Once delivery (Eventually-Consistent)
High-Availability
Property
Impact
Wire-protocol, Standard
interfaces
• Wire Protocol
• Custom protocols over TCP/IP, Http, gRPC
• Support popular database protocols
• Postgres - e.g. CockroachDB
• Memcached - e.g. Couch base
• Standard Interfaces
• JDBC - e.g. Apache Hive, Phoenix for HBase, Vitess
Schema(less) support
Schema is
not Evil
• Why Schema-less
• Sparse-Metric and Entity-Attribute-Value storage needs
• Frequent changes
• Why Schema
• Understanding structure of data
• Referential Data integrity, Quality of data controlled by Data
dictionary
• In-between
• Schema-less but require Column Indexes (e.g. ColumnFamily
model of KV stores)
CAP theorem critique
• “one example of a fundamental trade-off between safety and
liveness in fault-prone systems” [3]
• Too simplistic [4]
• Choice of CA impractical mostly (Single node database), critique
therefore applies to CP or AP.
• CAP-Availability and CAP-Consistency is a spectrum and not binary
• e.g. AP-Reads, AP-Writes, Strong Consistency vs. Eventual
Consistency
• Define application tradeoffs, validate impact on NFRs - Latency,
Throughput
• Good starting point for considering Polyglot persistence
X
Polyglot Persistence - Tiered Data stores
Source: Aadhaar technology white-paper
Polyglot Persistence - CQRS
Source: Microsoft MSDN
Source: Flipkart catalog Write & Read[5]
Polyglot Persistence - pluggable storage,
secondary indices
• Healthcare Graph data
(Conditions, Symptoms) on
Apache Titan
• Mostly Read-only queries - Point
lookups, one-hop traversals
• AP-Read data (Storage engine :
Cassandra)
• Also query by properties of
Vertex/Edge(Secondary indices
in ES)
Source: CureFit Symptoms & Conditions datastore
Others
• Performance benchmarks - Latency, Throughput,
Concurrency - e.g. Graph DBs benchmark [6]
• Operations & Maintenance - e.g. MySQL as
backend data store for Facebook TAO [7], LinkedIn
Espresso [8]
• Support - Paid (single vendor vs. multiple),
Community (size, composition)
• Hosted service - on public clouds as a managed
service
What to look for?
Under-the-hood considerations
Database Type
• Relational
• All field values of a row stored together
• Common storage formats: BTree
• Better suited for OLTP
• Columnar
• All values of a column stored together
• More efficient data compression
• OLAP queries perform better
Source:
https://gerardnico.com/wiki/relation/structure/column_store
Database Type
• Document
• Sub-class of a KV store
• Often hierarchical (DB -> Collection ->
Document)
• Often have challenges in optimising
storage - due to lack of Data
Dictionary (schema-free)
• KV
• Often RAM based
• Durability through replication(sync)
and persistence to disk
• Preference for LSM over in-place
updates when designed for SSD
Source:
https://blog.mlab.com/2014/01/how-big-is-your-mongodb/
Source:
http://www.aerospike.com/technologies/
Data Organisation
• B-Tree
• Better suited for in-place updates
• Log Structured Merged (LSM)
• Better suited for high insert volume
• Better suited for SSD (for reducing write
amplification)
• Achieve high data locality of reference
through good row-key design [9]
Source: http://www.programering.com/a/MTMwAzMwATM.html
Source: http://www.cyanny.com/2014/03/13/hbase-architecture-
analysis-part1-logical-architecture/
Replication, Consensus
• Replication
• Sync vs. Async
• No. of Replicas, Min. Replicas, Journalling, Guaranteed writes
with hinted handoff
• Single master read-write(CP) vs. Replica reads(AP)
• Consensus
• Used in
• Leader election
• Committing transactions/Log replication
• Strength of protocol - Paxos, Raft, Zab etc.
• Jepsen Tests (https://jepsen.io/) - Tests ‘Safety’ of distributed
databases
• e.g. CockroachDB, MongoDB, VoltDB, Solr, Elastic Search etc
Source:https://martin.kleppmann.com
Source: https://raft.github.io/raft.pdf
Operations
• Data export & restore (RPO, RTO) - Disaster Recovery(DR)
• Tools for full export vs incremental snapshots
• Tools for restoring from exports, logs
• Piggy-back on XDC replication support to create continuous/ongoing
backup&restore
• Large scale data migration [10]
• Mean Time to Recovery (MTTR) - Node failure/Minor outages
• e.g. promoting hot-standby to master
• Tools to detect failure, validate data, promote new master/leader
Cost
• Disk-Memory ratio
• Database architecture to support disk storage, size of on-disk data w.r.t RAM
• Compute required
• No. of compute nodes required to keep data on-line
• Power Consumption
• SSD based databases generally more energy efficient than HDD
• Density of storage
• Relevant when storing large data over extended periods of time
• e.g. Aadhaar enrolment raw data, Facebook photos [11]
DB-specific Optimisations to leverage RAM, reduce Disk I/O
• Data block-cache/buffer-pool
• Reduces disk I/O
• Provides lower latency on repeat reads
• Provides potentially lower latency for
reads on high data locality of reference
• Bloom Filters
• Reduces disk I/O and row scanning in
random key lookups
Source: https://sematext.com
Source: Cloudera
References
• [1] - Google Spanner becoming a SQL System
• [2] - CRDTs in Riak
• [3] - Perspectives on the CAP Theorem
• [4] - Martin Kleppmann CP or AP
• [5] - Flipkart Catalog System, Datastore
• [6] - Do We Need Specialised Graph Databases?
• [7] - Facebook TAO social graph data store
• [8] - LinkedIn Espresso
• [9] - Facebook style notifications using HBase
• [10] - Flipkart DC migration
• [11] - Facebook cold storage system

Más contenido relacionado

La actualidad más candente

HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon
 
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems Cloudera, Inc.
 
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefullySQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefullyMd Kamaruzzaman
 
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...✔ Eric David Benari, PMP
 
The Holy Grail of Data Analytics
The Holy Grail of Data AnalyticsThe Holy Grail of Data Analytics
The Holy Grail of Data AnalyticsDan Lynn
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...DataStax
 
Amazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni VamvadelisAmazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni Vamvadelishuguk
 
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Amazon Web Services
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...
Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...
Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...Fwdays
 
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...Amazon Web Services
 
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...Data Con LA
 
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index StructuresHBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index StructuresCloudera, Inc.
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8MongoDB
 
Splice Machine Overview
Splice Machine OverviewSplice Machine Overview
Splice Machine OverviewKunal Gupta
 
Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Zhenxiao Luo
 

La actualidad más candente (20)

HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
 
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
 
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefullySQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
 
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
 
Hadoop and HBase @eBay
Hadoop and HBase @eBayHadoop and HBase @eBay
Hadoop and HBase @eBay
 
The Holy Grail of Data Analytics
The Holy Grail of Data AnalyticsThe Holy Grail of Data Analytics
The Holy Grail of Data Analytics
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
Amazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni VamvadelisAmazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni Vamvadelis
 
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Introduction to Dremio
Introduction to DremioIntroduction to Dremio
Introduction to Dremio
 
Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...
Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...
Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...
 
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 20...
 
Presto: SQL-on-anything
Presto: SQL-on-anythingPresto: SQL-on-anything
Presto: SQL-on-anything
 
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
 
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index StructuresHBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
 
Splice Machine Overview
Splice Machine OverviewSplice Machine Overview
Splice Machine Overview
 
Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017
 

Destacado

E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres Regunath B
 
Unique identification authority of india uid
Unique identification authority of india   uidUnique identification authority of india   uid
Unique identification authority of india uidAjit Dadresa
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Regunath B
 
Aesop change data propagation
Aesop change data propagationAesop change data propagation
Aesop change data propagationRegunath B
 
Hadoop at aadhaar
Hadoop at aadhaarHadoop at aadhaar
Hadoop at aadhaarRegunath B
 
Building tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systemsBuilding tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systemsRegunath B
 
Building the Flipkart phantom
Building the Flipkart phantomBuilding the Flipkart phantom
Building the Flipkart phantomRegunath B
 
Facebook style notifications using hbase and event streams
Facebook style notifications using hbase and event streamsFacebook style notifications using hbase and event streams
Facebook style notifications using hbase and event streamsRegunath B
 
practical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome thempractical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome themsaipriyadonthula
 
Oss as a competitive advantage
Oss as a competitive advantageOss as a competitive advantage
Oss as a competitive advantageRegunath B
 
Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)Ali Raw
 

Destacado (14)

E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres
 
Unique identification authority of india uid
Unique identification authority of india   uidUnique identification authority of india   uid
Unique identification authority of india uid
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3
 
Uid
UidUid
Uid
 
Aesop change data propagation
Aesop change data propagationAesop change data propagation
Aesop change data propagation
 
Hadoop at aadhaar
Hadoop at aadhaarHadoop at aadhaar
Hadoop at aadhaar
 
Building tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systemsBuilding tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systems
 
Building the Flipkart phantom
Building the Flipkart phantomBuilding the Flipkart phantom
Building the Flipkart phantom
 
Facebook style notifications using hbase and event streams
Facebook style notifications using hbase and event streamsFacebook style notifications using hbase and event streams
Facebook style notifications using hbase and event streams
 
Aadhaar
AadhaarAadhaar
Aadhaar
 
Srikanth Nadhamuni
Srikanth NadhamuniSrikanth Nadhamuni
Srikanth Nadhamuni
 
practical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome thempractical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome them
 
Oss as a competitive advantage
Oss as a competitive advantageOss as a competitive advantage
Oss as a competitive advantage
 
Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)
 

Similar a What database

So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?David P. Moore
 
StreamHorizon overview
StreamHorizon overviewStreamHorizon overview
StreamHorizon overviewStreamHorizon
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld
 
NoSQL – Data Center Centric Application Enablement
NoSQL – Data Center Centric Application EnablementNoSQL – Data Center Centric Application Enablement
NoSQL – Data Center Centric Application EnablementDATAVERSITY
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarKognitio
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Apache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehouseApache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehousehadoopsphere
 
Tableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
Tableau on Hadoop Meet Up: Advancing from Extracts to Live ConnectTableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
Tableau on Hadoop Meet Up: Advancing from Extracts to Live ConnectRemy Rosenbaum
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Kudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast DataKudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast Datamichaelguia
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)Marco Gralike
 
Dissecting Scalable Database Architectures
Dissecting Scalable Database ArchitecturesDissecting Scalable Database Architectures
Dissecting Scalable Database Architectureshypertable
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Data Con LA
 
Productionizing Hadoop - New Lessons Learned
Productionizing Hadoop - New Lessons LearnedProductionizing Hadoop - New Lessons Learned
Productionizing Hadoop - New Lessons LearnedCloudera, Inc.
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processingSchubert Zhang
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceSATOSHI TAGOMORI
 

Similar a What database (20)

So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?
 
StreamHorizon overview
StreamHorizon overviewStreamHorizon overview
StreamHorizon overview
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
 
NoSQL – Data Center Centric Application Enablement
NoSQL – Data Center Centric Application EnablementNoSQL – Data Center Centric Application Enablement
NoSQL – Data Center Centric Application Enablement
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Apache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehouseApache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehouse
 
Tableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
Tableau on Hadoop Meet Up: Advancing from Extracts to Live ConnectTableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
Tableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Redshift overview
Redshift overviewRedshift overview
Redshift overview
 
Kudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast DataKudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast Data
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
An AMIS overview of database 12c
An AMIS overview of database 12cAn AMIS overview of database 12c
An AMIS overview of database 12c
 
An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)
 
Dissecting Scalable Database Architectures
Dissecting Scalable Database ArchitecturesDissecting Scalable Database Architectures
Dissecting Scalable Database Architectures
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Productionizing Hadoop - New Lessons Learned
Productionizing Hadoop - New Lessons LearnedProductionizing Hadoop - New Lessons Learned
Productionizing Hadoop - New Lessons Learned
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processing
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data Service
 

Último

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...ttt fff
 

Último (20)

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
 

What database

  • 1. What database? - a practical guide to selection from NoSQL, SQL and Polyglot data stores Regunath B twitter.com/RegunathB github.com/regunathb Engineering @ CureFit-HealthFace, ex-Flipkart Infra services, Built Aadhaar
  • 2. State of the Database Landscape - the options
  • 3. What to look for? Over-the-hood considerations
  • 4. Data Manipulation SQL[1] KV operations put(k,v) get(k) remove(k) Graph query g.V().has(‘name','hercules').out('father').out('father').values('name') Query Language Bulk Processing • Loading • Data Export & Transfer
  • 5. ACID properties 1.0 Atomicity Consistency Isolation Durability 2.0 (For Scaling) Associative Commutative Idempotent Distributed Transaction Support Data Staleness Ordering Surviving Crashes Transaction Support (Limited) Relaxed Ordering (High- Throughput) - CRDTs [2] Atleast-Once delivery (Eventually-Consistent) High-Availability Property Impact
  • 6. Wire-protocol, Standard interfaces • Wire Protocol • Custom protocols over TCP/IP, Http, gRPC • Support popular database protocols • Postgres - e.g. CockroachDB • Memcached - e.g. Couch base • Standard Interfaces • JDBC - e.g. Apache Hive, Phoenix for HBase, Vitess
  • 7. Schema(less) support Schema is not Evil • Why Schema-less • Sparse-Metric and Entity-Attribute-Value storage needs • Frequent changes • Why Schema • Understanding structure of data • Referential Data integrity, Quality of data controlled by Data dictionary • In-between • Schema-less but require Column Indexes (e.g. ColumnFamily model of KV stores)
  • 8. CAP theorem critique • “one example of a fundamental trade-off between safety and liveness in fault-prone systems” [3] • Too simplistic [4] • Choice of CA impractical mostly (Single node database), critique therefore applies to CP or AP. • CAP-Availability and CAP-Consistency is a spectrum and not binary • e.g. AP-Reads, AP-Writes, Strong Consistency vs. Eventual Consistency • Define application tradeoffs, validate impact on NFRs - Latency, Throughput • Good starting point for considering Polyglot persistence X
  • 9. Polyglot Persistence - Tiered Data stores Source: Aadhaar technology white-paper
  • 10. Polyglot Persistence - CQRS Source: Microsoft MSDN Source: Flipkart catalog Write & Read[5]
  • 11. Polyglot Persistence - pluggable storage, secondary indices • Healthcare Graph data (Conditions, Symptoms) on Apache Titan • Mostly Read-only queries - Point lookups, one-hop traversals • AP-Read data (Storage engine : Cassandra) • Also query by properties of Vertex/Edge(Secondary indices in ES) Source: CureFit Symptoms & Conditions datastore
  • 12. Others • Performance benchmarks - Latency, Throughput, Concurrency - e.g. Graph DBs benchmark [6] • Operations & Maintenance - e.g. MySQL as backend data store for Facebook TAO [7], LinkedIn Espresso [8] • Support - Paid (single vendor vs. multiple), Community (size, composition) • Hosted service - on public clouds as a managed service
  • 13. What to look for? Under-the-hood considerations
  • 14. Database Type • Relational • All field values of a row stored together • Common storage formats: BTree • Better suited for OLTP • Columnar • All values of a column stored together • More efficient data compression • OLAP queries perform better Source: https://gerardnico.com/wiki/relation/structure/column_store
  • 15. Database Type • Document • Sub-class of a KV store • Often hierarchical (DB -> Collection -> Document) • Often have challenges in optimising storage - due to lack of Data Dictionary (schema-free) • KV • Often RAM based • Durability through replication(sync) and persistence to disk • Preference for LSM over in-place updates when designed for SSD Source: https://blog.mlab.com/2014/01/how-big-is-your-mongodb/ Source: http://www.aerospike.com/technologies/
  • 16. Data Organisation • B-Tree • Better suited for in-place updates • Log Structured Merged (LSM) • Better suited for high insert volume • Better suited for SSD (for reducing write amplification) • Achieve high data locality of reference through good row-key design [9] Source: http://www.programering.com/a/MTMwAzMwATM.html Source: http://www.cyanny.com/2014/03/13/hbase-architecture- analysis-part1-logical-architecture/
  • 17. Replication, Consensus • Replication • Sync vs. Async • No. of Replicas, Min. Replicas, Journalling, Guaranteed writes with hinted handoff • Single master read-write(CP) vs. Replica reads(AP) • Consensus • Used in • Leader election • Committing transactions/Log replication • Strength of protocol - Paxos, Raft, Zab etc. • Jepsen Tests (https://jepsen.io/) - Tests ‘Safety’ of distributed databases • e.g. CockroachDB, MongoDB, VoltDB, Solr, Elastic Search etc Source:https://martin.kleppmann.com Source: https://raft.github.io/raft.pdf
  • 18. Operations • Data export & restore (RPO, RTO) - Disaster Recovery(DR) • Tools for full export vs incremental snapshots • Tools for restoring from exports, logs • Piggy-back on XDC replication support to create continuous/ongoing backup&restore • Large scale data migration [10] • Mean Time to Recovery (MTTR) - Node failure/Minor outages • e.g. promoting hot-standby to master • Tools to detect failure, validate data, promote new master/leader
  • 19. Cost • Disk-Memory ratio • Database architecture to support disk storage, size of on-disk data w.r.t RAM • Compute required • No. of compute nodes required to keep data on-line • Power Consumption • SSD based databases generally more energy efficient than HDD • Density of storage • Relevant when storing large data over extended periods of time • e.g. Aadhaar enrolment raw data, Facebook photos [11]
  • 20. DB-specific Optimisations to leverage RAM, reduce Disk I/O • Data block-cache/buffer-pool • Reduces disk I/O • Provides lower latency on repeat reads • Provides potentially lower latency for reads on high data locality of reference • Bloom Filters • Reduces disk I/O and row scanning in random key lookups Source: https://sematext.com Source: Cloudera
  • 21. References • [1] - Google Spanner becoming a SQL System • [2] - CRDTs in Riak • [3] - Perspectives on the CAP Theorem • [4] - Martin Kleppmann CP or AP • [5] - Flipkart Catalog System, Datastore • [6] - Do We Need Specialised Graph Databases? • [7] - Facebook TAO social graph data store • [8] - LinkedIn Espresso • [9] - Facebook style notifications using HBase • [10] - Flipkart DC migration • [11] - Facebook cold storage system