SlideShare una empresa de Scribd logo
1 de 70
Descargar para leer sin conexión
The Data Mullet 
From All SQL to No SQL back to Some SQL 
Alexis Lê-Quôc @alq
The Data Mullet 
From All SQL to No SQL back to Some SQL 
Alexis Lê-Quôc @alq
Alexis Lê-Quôc @alq 
This Talk 
• A (mostly) DIRTy Architecture for... 
• A new application (datadoghq.com) on a limited budget 
• Running on a public cloud 
• Focussing on data stores.
Some context
Servers 
Monitoring 
IaaS, PaaS Usage Analytics 
Perf. Management 
Apps 
Hosting 
CDNs Asset Management 
SDLC 
Ops team Dev team
Dev & Ops “collaborate” 
Alexis Lê-Quôc @alq
Concretely, what does Datadog 
do?
Alexis Lê-Quôc @alq 
etc.
Watching real 
time feeds 
Looking for patterns 
Alexis Lê-Quôc @alq 
Constant telemetry 
Real-time 
Bursty batches 
Share
Alexis Lê-Quôc @alq 
Data Taxonomy 
Metrics 
Unique visitors 
Load 
Transaction duration 
... 
Events 
Conversations 
Alerts 
Build & Deploys 
...
Alexis Lê-Quôc @alq 
Unit of scale 
• 1 source, typically a server 
• 100 metrics 
• Every 15 s 
• 24,000 points per hour 
• ~3 bytes per point 
• 100 KB/hour, 850 MB/year 
• Events 
• whenever they occur 
• Highest resolution: 1s 
• Small payload + metadata
Alexis Lê-Quôc @alq 
ACID, BASE & DIRT 
• ACID 
• http://en.wikipedia.org/wiki/ACID 
• BASE 
• http://en.wikipedia.org/wiki/Eventual_consistency 
• DIRT (Bryan Cantrill at Surge 2010) 
• http://dtrace.org/resources/bmc/DIRT.pdf
Let’s dig some DIRT
DI-RealTime
Alexis Lê-Quôc @alq 
The Consequences of DIRT? 
Latency 
• Data consumed by people (and machines) 
• Low end-to-end latency (5-15s) 
• Psycho-physiological Factor 
• Same order of magnitude as email/SMS* 
http://citeseerx.ist.psu.edu/viewdoc/* download?doi=10.1.1.76.2465&rep=rep1&type=pdf
Alexis Lê-Quôc @alq 
The Consequences of DIRT? 
Concurrency 
• Concurrent events & data points show up in sync 
• Access Patterns? 
• All recent data, e.g. last 24 hours
Alexis Lê-Quôc @alq 
The Consequences of DIRT? 
Tolerance to noise 
• Not a System of Record 
• “Real-time” decisions 
• Drop (some) individual data points rather be late 
• Applies to metrics, not events
Noise but no Latency Latency but no Noise 
Alexis Lê-Quôc @alq 
Cross here? Or here?
DataIntensive-RT
Alexis Lê-Quôc @alq 
The Consequences of DIRT? 
Storage 
• Business Cycles 
• Retention Policy > Business Cycle 
• E.g. retail, education 12 months 
• Elastic Storage 
• !CAPEX
Alexis Lê-Quôc @alq 
The Consequences of DIRT? 
Latency 
• Datadog, a data exploration app for people 
• Looking for patterns 
• Ideal: 300 ms round-trip 
• Access patterns for long-term data? 
• Storage trade-off: precompute oft-used properties 
• Run-time Trade-off: want longer timespan, get lower resolution 
• != RRD
Alexis Lê-Quôc @alq
Alexis Lê-Quôc @alq 
Aggregate 
Constant data influx 
Large data sets
Alexis Lê-Quôc @alq 
Aggregate 
Constant data influx 
Large data sets 
Watch & Share 
Real-time updates 
On-the-fly data analysis
Watch & Share 
Real-time updates 
On-the-fly data analysis 
Alexis Lê-Quôc @alq 
Aggregate 
Constant data influx 
Large data sets 
Look for Patterns 
On-demand visualization 
Background data analysis
Watch & Share 
Real-time updates 
On-the-fly data analysis 
Alexis Lê-Quôc @alq 
Aggregate 
Constant BASE 
data DIRT 
influx 
Large data sets 
Look for Patterns 
On-demand visualization 
Background data analysis
Watch Real-time DIRT 
& Share 
updates 
On-the-fly data analysis 
Alexis Lê-Quôc @alq 
Aggregate 
Constant BASE 
data DIRT 
influx 
Large data sets 
Look for Patterns 
On-demand visualization 
Background data analysis
Watch Real-time DIRT 
& Share 
updates 
On-the-fly data analysis 
Alexis Lê-Quôc @alq 
Aggregate 
Constant BASE 
data DIRT 
influx 
Large data sets 
Look for On-demand BASE 
Patterns 
visualization 
Background data analysis
Watch Real-time DIRT 
& Share 
updates 
On-the-fly data analysis 
Alexis Lê-Quôc @alq 
Aggregate 
Constant BASE 
data DIRT 
influx 
Large data sets 
Look for On-demand BASE 
Patterns 
visualization 
Background data analysis 
Datadog = DIRT + BASE + a tiny bit of ACID
Alexis Lê-Quôc @alq 
How It All Fits Together
Alexis Lê-Quôc @alq 
The Mullet 
All SQL in front, NoSQL party in the back
Alexis Lê-Quôc @alq 
Actual Stack
Alexis Lê-Quôc @alq 
Choices, choices 
• 5 axes 
• Volume of Data 
• Latency 
• Ops: wake-up-in-the-middle-of-the-night factor 
• Dev: community & tools 
• Cost as in “a function of X”
Choosing Elastic Storage
Alexis Lê-Quôc @alq 
Durable, Large-Scale Storage 
• Postgres 
• Mongo 
• Cassandra 
• (Riak) 
• SciDB
Alexis Lê-Quôc @alq 
Durable, Large-Scale Storage 
• Postgres 
• Itemized data points in a time series are useless 
• BLOB management not fun 
• Mongo 
• Cassandra 
• (Riak) 
• SciDB
Alexis Lê-Quôc @alq 
Durable, Large-Scale Storage 
• Postgres 
• Mongo 
• SciDB 
• Cassandra 
• (Riak)
Alexis Lê-Quôc @alq 
Durable, Large-Scale Storage 
• Postgres 
• Mongo 
• Durability in question in 2010 
• SciDB 
• Cassandra 
• (Riak)
Alexis Lê-Quôc @alq 
Durable, Large-Scale Storage 
• Postgres 
• Mongo 
• SciDB 
• Very very early 
• Cassandra 
• (Riak)
Alexis Lê-Quôc @alq 
Durable, Large-Scale Storage 
• Postgres 
• Mongo 
• SciDB 
• Our pick: Cassandra 
• (Riak)
Alexis Lê-Quôc @alq 
Cassandra: Volume of Data 
• 100s of hosts, 150TB at FB in 2010 
• Easy to distribute data, durable quorum writes
Alexis Lê-Quôc @alq 
Cassandra: Latency 
• < 10ms on writes 
• reads more variable (on EC2)* 
* More on this in a bit
Alexis Lê-Quôc @alq 
Cassandra: Ops 
• Release Engineering too aggressive 
• ~10 releases since 1/2011 on 0.7 branch 
• Good resilience to node loss in the later 0.7 versions 
• Annoying idiosyncrasies (cassandra.yaml, predictability of disk use)
Alexis Lê-Quôc @alq 
Cassandra: Dev 
• Bizarre nomenclature (rows, columns... families?) 
• Cumbersome data access 
• Limited Semantics when used to SQL 
• Good libraries
Alexis Lê-Quôc @alq 
Cassandra: Cost 
• Ops time 
• I/O limits raised by increasing number of nodes 
• Thereby increasing costs,
Alexis Lê-Quôc @alq 
Riak 
• Prototyped out of spite for Cassandra 0.7[0123] 
• We ♡ Erlang 
• Great folks 
• But Cassandra pain subsided, priorities shifted. 
• git merge datadog/riak did not happen
Choosing In-Mem
Alexis Lê-Quôc @alq 
In-memory DB 
• We started with Redis 
• Then we stopped looking :)
Alexis Lê-Quôc @alq 
Redis 
• Volume of Data 
• Limited by available RAM, easy partitioning in our case 
• Latency 
• << 5 ms, dominated by network 
• Ops 
• Low-maintenance, stable, predictable, replicated, boringly rock-solid 
• Dev 
• Brilliant, clear docs, simple protocol, oft-used native data structures 
• Cost 
• ~ cost of RAM on EC2
Choosing a SQL Data Store
Alexis Lê-Quôc @alq 
General-purpose data store 
• We ♡ SQL 
• Oracle 
• Postgres
Alexis Lê-Quôc @alq 
Oracle in numbers 
• base license 47.5 
• clustered db 23 
• replication 10 
• partitioning 11.5 
• analytics 23 
• in-mem cache 23 
• total: $138,000
Alexis Lê-Quôc @alq 
Oracle in numbers 
• base license 47.5 
• clustered db 23 
• replication 10 
• partitioning 11.5 
• analytics 23 
• in-mem cache 23 
• total: $138,000 
• for 2 cores 
• + 22% annual support 
• Just in licenses...
Alexis Lê-Quôc @alq 
Oracle in numbers 
• base license 47.5 
• clustered db 23 
• replication 10 
• partitioning 11.5 
• analytics 23 
• in-mem cache 23 
• total: $138,000 
• for 2 cores 
• + 22% annual support 
• Just in licenses...
Alexis Lê-Quôc @alq 
General-purpose data store 
• Oracle 
• Postgres
Alexis Lê-Quôc @alq 
Postgres 
• Volume of Data 
• High GBs, Low TBs 
• Latency 
• 10-100 ms after EXPLAIN ANALYZE 
• Ops 
• Low-maintenance, stable, predictable, replicated, boringly rock-solid 
• Dev 
• Well understood by (a certain class of) engineers 
• Cost, a function of storage latency
Alexis Lê-Quôc @alq 
Not forgetting... 
• VoltDB 
• RAM-based, potentially a match for our DIRTy parts 
• Stored procedures, an acquired taste 
• Home-grow data stores (soon) 
• Rainbird 
• ...
Alexis Lê-Quôc @alq 
The Data Mullet 
• All open-source, good if you’re ready to dive in code 
• $0 CAPEX 
• All OPEX on EC2
Alexis Lê-Quôc @alq 
The Data Mullet on EC2 
Structural Weakness: I/O latency at moderate throughputs
One “bad” cassandra query 
Alexis Lê-Quôc @alq
Clogging the I/O pipes on EC2 
Alexis Lê-Quôc @alq 
Maximum Average Wait: up to 670 ms 
Maximum Service Time: up to 5 ms 
While writing 100 MB/s 
and reading 30 MB/s
Alexis Lê-Quôc @alq 
Average wait in ms 
Transfer per seconds 
Consumer HD: ~75 tps 
SSD: 1-30 Ktps 
DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 
03:35:02 PM dev8-80 380 24000 5.7 62 47 130 1.3 47 
03:35:02 PM dev8-96 370 24000 5.6 63 46 120 1.2 45 
03:35:02 PM dev8-112 380 24000 5.5 63 46 120 1.2 46 
03:35:02 PM dev8-128 380 24000 7.2 63 56 150 1.3 50 
Average service time in 
ms 
Read throughput in sector/s 
Total: 46 MB/s 
Another “Bad” Query
Mitigation of I/O issues?
Alexis Lê-Quôc @alq 
Cassandra: I/O Mitigation 
• More nodes, more RAM, more partitions, more parallelism 
• $$$
Alexis Lê-Quôc @alq 
Postgres: I/O Mitigation 
• Scale up to a point 
• Replicate 
• Move to bare Metal => $$$ 
• A well-trodden but difficult path
Alexis Lê-Quôc @alq 
Better yet... 
• Less dependency on low-latency, durable storage 
• Move more data to RAM (Redis) 
• Archive immutable data 
• S3/Cloudfront?
Alexis Lê-Quôc @alq 
A digression: 
Your Very Own Chaos Monkey 
• Instances go bye-bye 
• https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/741224 
• Instances go bye-bye, take 2 (high load) 
• https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/708920
Alexis Lê-Quôc @alq 
Takeaway 
• By mixing and matching open-source SQL (PG) and NoSQL (Redis, 
Cassandra) Datadog has been able to quickly & simply get up-and-running 
with “$0” down payment on infrastructure.
http://datadoghq.com 
@datadoghq 
Alexis Lê-Quôc @alq

Más contenido relacionado

La actualidad más candente

La actualidad más candente (19)

關聯式vs非關聯式資料庫
關聯式vs非關聯式資料庫關聯式vs非關聯式資料庫
關聯式vs非關聯式資料庫
 
Best Practices for implementing Database Security Comprehensive Database Secu...
Best Practices for implementing Database Security Comprehensive Database Secu...Best Practices for implementing Database Security Comprehensive Database Secu...
Best Practices for implementing Database Security Comprehensive Database Secu...
 
Role-Based Access Control (RBAC) in Neo4j
Role-Based Access Control (RBAC) in Neo4jRole-Based Access Control (RBAC) in Neo4j
Role-Based Access Control (RBAC) in Neo4j
 
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
 
Oracle 12c and its pluggable databases
Oracle 12c and its pluggable databasesOracle 12c and its pluggable databases
Oracle 12c and its pluggable databases
 
Vertica
VerticaVertica
Vertica
 
Powerful Spatial Features You Never Knew Existed in Oracle Spatial and Graph ...
Powerful Spatial Features You Never Knew Existed in Oracle Spatial and Graph ...Powerful Spatial Features You Never Knew Existed in Oracle Spatial and Graph ...
Powerful Spatial Features You Never Knew Existed in Oracle Spatial and Graph ...
 
The Top 5 Reasons to Deploy Your Applications on Oracle RAC
The Top 5 Reasons to Deploy Your Applications on Oracle RACThe Top 5 Reasons to Deploy Your Applications on Oracle RAC
The Top 5 Reasons to Deploy Your Applications on Oracle RAC
 
SQL vs NoSQL | MySQL vs MongoDB Tutorial | Edureka
SQL vs NoSQL | MySQL vs MongoDB Tutorial | EdurekaSQL vs NoSQL | MySQL vs MongoDB Tutorial | Edureka
SQL vs NoSQL | MySQL vs MongoDB Tutorial | Edureka
 
NOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jNOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4j
 
ASM
ASMASM
ASM
 
Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)
 
Helping Small Companies Leverage CTI with an Open Source Threat Mapping
Helping Small Companies Leverage CTI with an Open Source Threat MappingHelping Small Companies Leverage CTI with an Open Source Threat Mapping
Helping Small Companies Leverage CTI with an Open Source Threat Mapping
 
Active directory
Active directoryActive directory
Active directory
 
Pgday bdr 천정대
Pgday bdr 천정대Pgday bdr 천정대
Pgday bdr 천정대
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative Facts
 
Architecture of exadata database machine – Part II
Architecture of exadata database machine – Part IIArchitecture of exadata database machine – Part II
Architecture of exadata database machine – Part II
 
Getting the most out of MariaDB MaxScale
Getting the most out of MariaDB MaxScaleGetting the most out of MariaDB MaxScale
Getting the most out of MariaDB MaxScale
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
 

Destacado

Fact based monitoring
Fact based monitoringFact based monitoring
Fact based monitoring
Datadog
 
How to measure everything - a million metrics per second with minimal develop...
How to measure everything - a million metrics per second with minimal develop...How to measure everything - a million metrics per second with minimal develop...
How to measure everything - a million metrics per second with minimal develop...
Jos Boumans
 

Destacado (19)

Treating Infrastructure as Garbage
Treating Infrastructure as GarbageTreating Infrastructure as Garbage
Treating Infrastructure as Garbage
 
Alerting: more signal, less noise, less pain
Alerting: more signal, less noise, less painAlerting: more signal, less noise, less pain
Alerting: more signal, less noise, less pain
 
DevOps, continuous delivery, & the new composable enterprise
DevOps, continuous delivery, & the new composable enterpriseDevOps, continuous delivery, & the new composable enterprise
DevOps, continuous delivery, & the new composable enterprise
 
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015
 
Monitoring MySQL at scale
Monitoring MySQL at scaleMonitoring MySQL at scale
Monitoring MySQL at scale
 
I &lt;3 graphs in 20 slides
I &lt;3 graphs in 20 slidesI &lt;3 graphs in 20 slides
I &lt;3 graphs in 20 slides
 
Events and metrics the Lifeblood of Webops
Events and metrics the Lifeblood of WebopsEvents and metrics the Lifeblood of Webops
Events and metrics the Lifeblood of Webops
 
Big (IT) data
Big (IT) dataBig (IT) data
Big (IT) data
 
Fact based monitoring
Fact based monitoringFact based monitoring
Fact based monitoring
 
Deep dive into Nagios analytics
Deep dive into Nagios analyticsDeep dive into Nagios analytics
Deep dive into Nagios analytics
 
Just enough web ops for web developers
Just enough web ops for web developersJust enough web ops for web developers
Just enough web ops for web developers
 
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
 
Customer Ops: DevOps &lt;3 customer support
Customer Ops: DevOps &lt;3 customer supportCustomer Ops: DevOps &lt;3 customer support
Customer Ops: DevOps &lt;3 customer support
 
Effective monitoring with StatsD
Effective monitoring with StatsDEffective monitoring with StatsD
Effective monitoring with StatsD
 
Monitoring Docker containers - Docker NYC Feb 2015
Monitoring Docker containers - Docker NYC Feb 2015Monitoring Docker containers - Docker NYC Feb 2015
Monitoring Docker containers - Docker NYC Feb 2015
 
Monitoring NGINX (plus): key metrics and how-to
Monitoring NGINX (plus): key metrics and how-toMonitoring NGINX (plus): key metrics and how-to
Monitoring NGINX (plus): key metrics and how-to
 
PyData NYC 2015 - Automatically Detecting Outliers with Datadog
PyData NYC 2015 - Automatically Detecting Outliers with Datadog PyData NYC 2015 - Automatically Detecting Outliers with Datadog
PyData NYC 2015 - Automatically Detecting Outliers with Datadog
 
How to measure everything - a million metrics per second with minimal develop...
How to measure everything - a million metrics per second with minimal develop...How to measure everything - a million metrics per second with minimal develop...
How to measure everything - a million metrics per second with minimal develop...
 
Application Monitoring using Datadog
Application Monitoring using DatadogApplication Monitoring using Datadog
Application Monitoring using Datadog
 

Similar a The Data Mullet: From all SQL to No SQL back to Some SQL

Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Lucidworks
 
Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebService
Minsk MongoDB User Group
 
London devops logging
London devops loggingLondon devops logging
London devops logging
Tomas Doran
 

Similar a The Data Mullet: From all SQL to No SQL back to Some SQL (20)

Scality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup PresentationScality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup Presentation
 
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenJ1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
 
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
 
Webinar: The Future of SQL
Webinar: The Future of SQLWebinar: The Future of SQL
Webinar: The Future of SQL
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 
SFScon18 - Stefano Pampaloni - The SQL revenge
SFScon18 - Stefano Pampaloni - The SQL revengeSFScon18 - Stefano Pampaloni - The SQL revenge
SFScon18 - Stefano Pampaloni - The SQL revenge
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
 
Cost Effectively Run Multiple Oracle Database Copies at Scale
Cost Effectively Run Multiple Oracle Database Copies at Scale Cost Effectively Run Multiple Oracle Database Copies at Scale
Cost Effectively Run Multiple Oracle Database Copies at Scale
 
Log Analysis At Scale
Log Analysis At ScaleLog Analysis At Scale
Log Analysis At Scale
 
Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebService
 
NewSQL - Deliverance from BASE and back to SQL and ACID
NewSQL - Deliverance from BASE and back to SQL and ACIDNewSQL - Deliverance from BASE and back to SQL and ACID
NewSQL - Deliverance from BASE and back to SQL and ACID
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data Service
 
Best of re:Invent
Best of re:InventBest of re:Invent
Best of re:Invent
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
Riga dev day: Lambda architecture at AWS
Riga dev day: Lambda architecture at AWSRiga dev day: Lambda architecture at AWS
Riga dev day: Lambda architecture at AWS
 
London devops logging
London devops loggingLondon devops logging
London devops logging
 
AWS Lambda support for AWS X-Ray
AWS Lambda support for AWS X-RayAWS Lambda support for AWS X-Ray
AWS Lambda support for AWS X-Ray
 
0bbleedingedge long-140614012258-phpapp02 lynn-langit
0bbleedingedge long-140614012258-phpapp02 lynn-langit0bbleedingedge long-140614012258-phpapp02 lynn-langit
0bbleedingedge long-140614012258-phpapp02 lynn-langit
 
Bleeding Edge Databases
Bleeding Edge DatabasesBleeding Edge Databases
Bleeding Edge Databases
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 

Más de Datadog

Más de Datadog (15)

What it Means to be a Next-Generation Managed Service Provider
What it Means to be a Next-Generation Managed Service ProviderWhat it Means to be a Next-Generation Managed Service Provider
What it Means to be a Next-Generation Managed Service Provider
 
Lifting the Blinds: Monitoring Windows Server 2012
Lifting the Blinds: Monitoring Windows Server 2012Lifting the Blinds: Monitoring Windows Server 2012
Lifting the Blinds: Monitoring Windows Server 2012
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloud
 
Datadog + VictorOps Webinar
Datadog + VictorOps WebinarDatadog + VictorOps Webinar
Datadog + VictorOps Webinar
 
Dataday Texas 2016 - Datadog
Dataday Texas 2016 - DatadogDataday Texas 2016 - Datadog
Dataday Texas 2016 - Datadog
 
Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015
Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015
Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015
 
Running & Monitoring Docker at Scale
Running & Monitoring Docker at ScaleRunning & Monitoring Docker at Scale
Running & Monitoring Docker at Scale
 
Fact-Based Monitoring
Fact-Based MonitoringFact-Based Monitoring
Fact-Based Monitoring
 
What’s in this Cookbook? - Mike Fiedler
What’s in this Cookbook? - Mike FiedlerWhat’s in this Cookbook? - Mike Fiedler
What’s in this Cookbook? - Mike Fiedler
 
I Love Graphs - Alexis Lê-Quôc
I Love Graphs - Alexis Lê-QuôcI Love Graphs - Alexis Lê-Quôc
I Love Graphs - Alexis Lê-Quôc
 
Virtualization at Gilt - Rangarajan Radhakrishnan
Virtualization at Gilt - Rangarajan RadhakrishnanVirtualization at Gilt - Rangarajan Radhakrishnan
Virtualization at Gilt - Rangarajan Radhakrishnan
 
Why Puppet Sucks - Rob Terhaar
Why Puppet Sucks - Rob TerhaarWhy Puppet Sucks - Rob Terhaar
Why Puppet Sucks - Rob Terhaar
 
Welcome to a Computing Revolution - Alex Lesser
Welcome to a Computing Revolution - Alex LesserWelcome to a Computing Revolution - Alex Lesser
Welcome to a Computing Revolution - Alex Lesser
 
Cosa Nostra - Tom Santero
Cosa Nostra - Tom SanteroCosa Nostra - Tom Santero
Cosa Nostra - Tom Santero
 
Bulk Exporting from Cassandra - Carlo Cabanilla
Bulk Exporting from Cassandra - Carlo CabanillaBulk Exporting from Cassandra - Carlo Cabanilla
Bulk Exporting from Cassandra - Carlo Cabanilla
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

The Data Mullet: From all SQL to No SQL back to Some SQL

  • 1. The Data Mullet From All SQL to No SQL back to Some SQL Alexis Lê-Quôc @alq
  • 2. The Data Mullet From All SQL to No SQL back to Some SQL Alexis Lê-Quôc @alq
  • 3. Alexis Lê-Quôc @alq This Talk • A (mostly) DIRTy Architecture for... • A new application (datadoghq.com) on a limited budget • Running on a public cloud • Focussing on data stores.
  • 5. Servers Monitoring IaaS, PaaS Usage Analytics Perf. Management Apps Hosting CDNs Asset Management SDLC Ops team Dev team
  • 6.
  • 7. Dev & Ops “collaborate” Alexis Lê-Quôc @alq
  • 8. Concretely, what does Datadog do?
  • 10. Watching real time feeds Looking for patterns Alexis Lê-Quôc @alq Constant telemetry Real-time Bursty batches Share
  • 11. Alexis Lê-Quôc @alq Data Taxonomy Metrics Unique visitors Load Transaction duration ... Events Conversations Alerts Build & Deploys ...
  • 12. Alexis Lê-Quôc @alq Unit of scale • 1 source, typically a server • 100 metrics • Every 15 s • 24,000 points per hour • ~3 bytes per point • 100 KB/hour, 850 MB/year • Events • whenever they occur • Highest resolution: 1s • Small payload + metadata
  • 13. Alexis Lê-Quôc @alq ACID, BASE & DIRT • ACID • http://en.wikipedia.org/wiki/ACID • BASE • http://en.wikipedia.org/wiki/Eventual_consistency • DIRT (Bryan Cantrill at Surge 2010) • http://dtrace.org/resources/bmc/DIRT.pdf
  • 16. Alexis Lê-Quôc @alq The Consequences of DIRT? Latency • Data consumed by people (and machines) • Low end-to-end latency (5-15s) • Psycho-physiological Factor • Same order of magnitude as email/SMS* http://citeseerx.ist.psu.edu/viewdoc/* download?doi=10.1.1.76.2465&rep=rep1&type=pdf
  • 17. Alexis Lê-Quôc @alq The Consequences of DIRT? Concurrency • Concurrent events & data points show up in sync • Access Patterns? • All recent data, e.g. last 24 hours
  • 18. Alexis Lê-Quôc @alq The Consequences of DIRT? Tolerance to noise • Not a System of Record • “Real-time” decisions • Drop (some) individual data points rather be late • Applies to metrics, not events
  • 19. Noise but no Latency Latency but no Noise Alexis Lê-Quôc @alq Cross here? Or here?
  • 21. Alexis Lê-Quôc @alq The Consequences of DIRT? Storage • Business Cycles • Retention Policy > Business Cycle • E.g. retail, education 12 months • Elastic Storage • !CAPEX
  • 22. Alexis Lê-Quôc @alq The Consequences of DIRT? Latency • Datadog, a data exploration app for people • Looking for patterns • Ideal: 300 ms round-trip • Access patterns for long-term data? • Storage trade-off: precompute oft-used properties • Run-time Trade-off: want longer timespan, get lower resolution • != RRD
  • 24. Alexis Lê-Quôc @alq Aggregate Constant data influx Large data sets
  • 25. Alexis Lê-Quôc @alq Aggregate Constant data influx Large data sets Watch & Share Real-time updates On-the-fly data analysis
  • 26. Watch & Share Real-time updates On-the-fly data analysis Alexis Lê-Quôc @alq Aggregate Constant data influx Large data sets Look for Patterns On-demand visualization Background data analysis
  • 27. Watch & Share Real-time updates On-the-fly data analysis Alexis Lê-Quôc @alq Aggregate Constant BASE data DIRT influx Large data sets Look for Patterns On-demand visualization Background data analysis
  • 28. Watch Real-time DIRT & Share updates On-the-fly data analysis Alexis Lê-Quôc @alq Aggregate Constant BASE data DIRT influx Large data sets Look for Patterns On-demand visualization Background data analysis
  • 29. Watch Real-time DIRT & Share updates On-the-fly data analysis Alexis Lê-Quôc @alq Aggregate Constant BASE data DIRT influx Large data sets Look for On-demand BASE Patterns visualization Background data analysis
  • 30. Watch Real-time DIRT & Share updates On-the-fly data analysis Alexis Lê-Quôc @alq Aggregate Constant BASE data DIRT influx Large data sets Look for On-demand BASE Patterns visualization Background data analysis Datadog = DIRT + BASE + a tiny bit of ACID
  • 31. Alexis Lê-Quôc @alq How It All Fits Together
  • 32. Alexis Lê-Quôc @alq The Mullet All SQL in front, NoSQL party in the back
  • 33. Alexis Lê-Quôc @alq Actual Stack
  • 34. Alexis Lê-Quôc @alq Choices, choices • 5 axes • Volume of Data • Latency • Ops: wake-up-in-the-middle-of-the-night factor • Dev: community & tools • Cost as in “a function of X”
  • 36. Alexis Lê-Quôc @alq Durable, Large-Scale Storage • Postgres • Mongo • Cassandra • (Riak) • SciDB
  • 37. Alexis Lê-Quôc @alq Durable, Large-Scale Storage • Postgres • Itemized data points in a time series are useless • BLOB management not fun • Mongo • Cassandra • (Riak) • SciDB
  • 38. Alexis Lê-Quôc @alq Durable, Large-Scale Storage • Postgres • Mongo • SciDB • Cassandra • (Riak)
  • 39. Alexis Lê-Quôc @alq Durable, Large-Scale Storage • Postgres • Mongo • Durability in question in 2010 • SciDB • Cassandra • (Riak)
  • 40. Alexis Lê-Quôc @alq Durable, Large-Scale Storage • Postgres • Mongo • SciDB • Very very early • Cassandra • (Riak)
  • 41. Alexis Lê-Quôc @alq Durable, Large-Scale Storage • Postgres • Mongo • SciDB • Our pick: Cassandra • (Riak)
  • 42. Alexis Lê-Quôc @alq Cassandra: Volume of Data • 100s of hosts, 150TB at FB in 2010 • Easy to distribute data, durable quorum writes
  • 43. Alexis Lê-Quôc @alq Cassandra: Latency • < 10ms on writes • reads more variable (on EC2)* * More on this in a bit
  • 44. Alexis Lê-Quôc @alq Cassandra: Ops • Release Engineering too aggressive • ~10 releases since 1/2011 on 0.7 branch • Good resilience to node loss in the later 0.7 versions • Annoying idiosyncrasies (cassandra.yaml, predictability of disk use)
  • 45. Alexis Lê-Quôc @alq Cassandra: Dev • Bizarre nomenclature (rows, columns... families?) • Cumbersome data access • Limited Semantics when used to SQL • Good libraries
  • 46. Alexis Lê-Quôc @alq Cassandra: Cost • Ops time • I/O limits raised by increasing number of nodes • Thereby increasing costs,
  • 47. Alexis Lê-Quôc @alq Riak • Prototyped out of spite for Cassandra 0.7[0123] • We ♡ Erlang • Great folks • But Cassandra pain subsided, priorities shifted. • git merge datadog/riak did not happen
  • 49. Alexis Lê-Quôc @alq In-memory DB • We started with Redis • Then we stopped looking :)
  • 50. Alexis Lê-Quôc @alq Redis • Volume of Data • Limited by available RAM, easy partitioning in our case • Latency • << 5 ms, dominated by network • Ops • Low-maintenance, stable, predictable, replicated, boringly rock-solid • Dev • Brilliant, clear docs, simple protocol, oft-used native data structures • Cost • ~ cost of RAM on EC2
  • 51. Choosing a SQL Data Store
  • 52. Alexis Lê-Quôc @alq General-purpose data store • We ♡ SQL • Oracle • Postgres
  • 53. Alexis Lê-Quôc @alq Oracle in numbers • base license 47.5 • clustered db 23 • replication 10 • partitioning 11.5 • analytics 23 • in-mem cache 23 • total: $138,000
  • 54. Alexis Lê-Quôc @alq Oracle in numbers • base license 47.5 • clustered db 23 • replication 10 • partitioning 11.5 • analytics 23 • in-mem cache 23 • total: $138,000 • for 2 cores • + 22% annual support • Just in licenses...
  • 55. Alexis Lê-Quôc @alq Oracle in numbers • base license 47.5 • clustered db 23 • replication 10 • partitioning 11.5 • analytics 23 • in-mem cache 23 • total: $138,000 • for 2 cores • + 22% annual support • Just in licenses...
  • 56. Alexis Lê-Quôc @alq General-purpose data store • Oracle • Postgres
  • 57. Alexis Lê-Quôc @alq Postgres • Volume of Data • High GBs, Low TBs • Latency • 10-100 ms after EXPLAIN ANALYZE • Ops • Low-maintenance, stable, predictable, replicated, boringly rock-solid • Dev • Well understood by (a certain class of) engineers • Cost, a function of storage latency
  • 58. Alexis Lê-Quôc @alq Not forgetting... • VoltDB • RAM-based, potentially a match for our DIRTy parts • Stored procedures, an acquired taste • Home-grow data stores (soon) • Rainbird • ...
  • 59. Alexis Lê-Quôc @alq The Data Mullet • All open-source, good if you’re ready to dive in code • $0 CAPEX • All OPEX on EC2
  • 60. Alexis Lê-Quôc @alq The Data Mullet on EC2 Structural Weakness: I/O latency at moderate throughputs
  • 61. One “bad” cassandra query Alexis Lê-Quôc @alq
  • 62. Clogging the I/O pipes on EC2 Alexis Lê-Quôc @alq Maximum Average Wait: up to 670 ms Maximum Service Time: up to 5 ms While writing 100 MB/s and reading 30 MB/s
  • 63. Alexis Lê-Quôc @alq Average wait in ms Transfer per seconds Consumer HD: ~75 tps SSD: 1-30 Ktps DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 03:35:02 PM dev8-80 380 24000 5.7 62 47 130 1.3 47 03:35:02 PM dev8-96 370 24000 5.6 63 46 120 1.2 45 03:35:02 PM dev8-112 380 24000 5.5 63 46 120 1.2 46 03:35:02 PM dev8-128 380 24000 7.2 63 56 150 1.3 50 Average service time in ms Read throughput in sector/s Total: 46 MB/s Another “Bad” Query
  • 64. Mitigation of I/O issues?
  • 65. Alexis Lê-Quôc @alq Cassandra: I/O Mitigation • More nodes, more RAM, more partitions, more parallelism • $$$
  • 66. Alexis Lê-Quôc @alq Postgres: I/O Mitigation • Scale up to a point • Replicate • Move to bare Metal => $$$ • A well-trodden but difficult path
  • 67. Alexis Lê-Quôc @alq Better yet... • Less dependency on low-latency, durable storage • Move more data to RAM (Redis) • Archive immutable data • S3/Cloudfront?
  • 68. Alexis Lê-Quôc @alq A digression: Your Very Own Chaos Monkey • Instances go bye-bye • https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/741224 • Instances go bye-bye, take 2 (high load) • https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/708920
  • 69. Alexis Lê-Quôc @alq Takeaway • By mixing and matching open-source SQL (PG) and NoSQL (Redis, Cassandra) Datadog has been able to quickly & simply get up-and-running with “$0” down payment on infrastructure.