SlideShare una empresa de Scribd logo
1 de 106
Descargar para leer sin conexión
Measure All The Things!
Gary Dusbabek
Rackspace
@gdusbabek
Motivation
What You Really Want
Kinds of Metrics
How To Do It
Prognostication
Motivation
It’s all
about
the data
We are generating data at an insane rate.
We are generating data at an insane rate.
2006 
IDC estimates 161 Exabytes of
data on the Internet

That is 161 MM 1T drives
2009
988 Exabytes of data
6x growth in 4 years
Almost 1B 1T drives
A zetabyte 21 zeroes

Source http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf
2012
Internet was estimated to be shipping
roughly 2.5 exabytes of data daily.


Daily

Not counting the NSA
Transferring
Data

Generates
Data
Metadata!
Secondary
Information
A by-product
Example 1
Cloud Monitoring


Is the website up?

GET HTTP/1.1
Status=200
Bytes=432
Time to connect=15ms
Time to first byte=21ms
Duration=28ms
Example 2
Netflix


You want to watch an
episode of Buffy
Observations



What titles you click on
What time of day you started watching
When you paused
Parts you re-watched
When you finished (if you finished)
Useless to people
consuming the
primary data.
Priceless when you’re
trying to understand
behavior.
behavior
Understanding = Knowledge
In these cases all the
data generated is
time-series
Time Series Data


Related events 
sorted by time 
of occurrence
Example
0600 – Wake up
0601 – Checked Hacker News
0605 – Shower
0630 – Breakfast
0630 – Checked Hacker News
0700 – Left for work
0730 – Arrived at work
Etc…
Think about how you’d
store something like this if
you were building a
backend system
Relational Database Much?
You
When
0600

What
Wake
up

0601

Checked
Hacker
News

0605

Shower

0630 Breakfast
0630
0700
0730
0731

Checked
Hacker
News

Left for
work
Arrive
at work
Checked
Hacker
News
Who

When

You

0600

What
Wake
up

You

0601

Checked
Hacker
News

You

0605

Shower

You

0630 Breakfast

You

0630

You

0700

You

0730

You

0731

Checked
Hacker
News

Left for
work
Arrive
at work
Checked
Hacker
News
Who

When

You

0600

You

0601

Friend 0603
Friend 0604

What
Wake
up

Checked
Hacker
News

Wake
up

Checked
Hacker
News

You

0605

You

0630 Breakfast

You

0630

You

0700

Friend 0715

You
You

0730
0731

Shower

Checked
Hacker
News

Left for
work
Left for
work
Arrive
at work
Checked
Hacker
News
Other Ways?
Less Appealing
Column Oriented
0600

Wake
up

0601

Checked
Hacker
News

0605

Shower

Friend 0603

Wake
up

0604

Checked
Hacker
News

0715

Left for
work

You

0630 Breakfast 0630

Checked
Hacker
News

0700

Left for
work

0730

Arrive
at work

0731

Checked
Hacker
News
What You Really Want
You
run a
business
You
want to
make
money
You
want to
make
money

Show me the
money!
You need to
make
decisions
You need to
make the right
decisions
How do you do that?
With
your gut
With data
Example
API responses
are taking a
long time.
It’s probably
the database.
You add a few indexes.
You allocate more memory.
You get faster disks.
You get bigger processors.
Maybe it’s the
network…
You replace ethernet adapters.

You get faster switches.
You replace the cabling.
Crap!
Trace it!
500 ms for entire request



15 ms on the wire getting there.
200 ms to auth
50 ms looking up account
50 ms looking up other stuff
15 ms on the wire getting back.
170 ms rendering in the browser
500 ms for entire request



15 ms on the wire getting there.
200 ms to auth
50 ms looking up account
50 ms looking up other stuff
15 ms on the wire getting back.
170 ms rendering in the browser
Make the right
decisions with
data.
You need a metrics system
Take these things into account:

Availability
Redundancy
Accuracy
And your budget
Example: Pretty Graphs
If graphs go away, do
you lose money?
The CEO likes
them.
Do graphs help
you make
decisions?
Example:
Usage Billing
Will losing data
cost you money?
Data Lifecycle
When can I throw it away?
How much work is
throwing it away?
How much work is
throwing it away?
More work
means it probably
won’t happen.
Kinds of Metrics
{Volume, Frequency}

⨯ {Low, High}
Low Volume, High Frequency
5,6,5,6
Things observed infrequently
Almost always changes
Low storage overhead
Bulk operations are easy
Usually uninteresting
Low Volume, Low Frequency
5,5,5,6
Roughly the same as LVHF
High Volume, Low Frequency
5,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,7,7

Constantly observed
But doesn’t change much
Optimizations!
Detect and record only level changes
Requires caching
High Volume, High Frequency


34,4,7,345,6,4,2,54,67,5,6,55,74,5,3,2,5,6745…
High Volume, High Frequency


34,4,7,345,6,4,2,54,67,5,6,55,74,5,3,2,5,6745…
Numeric vs String
Most will be numeric
Some are strings
Usually low frequency
Special handling
Numeric vs String
High frequency strings are
a sign you’re doing
something wrong or need a
different system.
Gauges
Current value of something
Operation: snapshot
Speedometer
Thermometer
CPU utilization
Counter
Exists as a set of operations
–  Operation: increment
–  Operation: decrement


Read by selecting over time and
summing


Example: hits on a website
Different than unique hits
Set
statsD
Number of uniquely seen items
Think: Conditional counter
Example: number of unique
visitors
Timer
How long something takes


Statistics (mean, median,
min, max, percentiles)

How many times it has
happened


Rate at which it is
happened
Uses a sliding window
Histograms
Distribution of data


Example: when people
visit your site
How Do You Do It?
If you make software

Instrument it!
Java?
https://github.com/codahale/metrics


Node.js?
https://github.com/mikejihbe/metrics

Others?
Of course
If you run systems


Instrument them!
Get data via agent
Get data via pollers
Considerations: inside or outside of
your network
StatsD
https://github.com/etsy/statsd
Ingests, aggregates, flushes

Use a client to send your data
Pushes aggregations
Graphite
Databases
Flat files of JSON
Wherever
Graphite
http://graphite.wikidot.com
Makes graphs
Pluggable backends (NEW!!!11)
Scaling problems
Buy Enterprise Software
These exist, but I’m an open
source hacker and can’t say
much about them.
Roll Your Own
Easier than you think
Harder than you think
Roll Your Own
Three components
Ingestion
Aggregation/Rollup
Query/Graphing
Avoid Pileups



1 sample per second
3,600 samples per hour
86,400 samples per day
31,536,000 samples per year
1k of storage?
(roughly) 32 gigabytes
No!
Measure all the
right things!
Does this measurement matter?
You don’t care about it when it changes


You aren’t doing anything with it


You can’t figure out what actions to take
from it 




(it’s meaningless)
Recent data
will almost
always be
most
important.
Monitoring vs Aggregation
Graphite collects data that is
already aggregated. 
You are observing history
Looking for patterns
No alerting
Where Things Are Going
Complex Event Analysis
ESPER (my favorite).
– Mostly open source.


Not enough projects though L
Data Intelligence
You need this if you don’t know
what questions you ought to
ask

Correlating signals in order to
make useful conclusions
Thanks!
@gdusbabek
Photos from the Flickr CC collection

train 
data

dump

truck 
traffic 
byproduct
watching 
numbers 
birds 
moons 
cake

business 
guts

data 2 
choices 
flowers 
metrics 
gauge 
counter 
marbles 
timer 
windmils 
logs

train 
tower 

h"p://www.flickr.com/photos/vxla/4673817364/sizes/z/	
  
h"p://www.flickr.com/photos/tensafefrogs/3649985674/sizes/z/	
  
h"p://www.flickr.com/photos/seanhobson/3906189027/sizes/l/	
  
h"p://www.flickr.com/photos/shankaronline/7291507876/sizes/l/	
  
h"p://www.flickr.com/photos/honou/3350764803/sizes/l/	
  
h"p://www.flickr.com/photos/jdickert/2152739544/sizes/l/	
  
h"p://www.flickr.com/photos/28misguidedsouls/6517859113/sizes/z/	
  
h"p://www.flickr.com/photos/55176801@N02/7911595842/sizes/o/	
  
h"p://www.flickr.com/photos/johnkay/3764457497/sizes/l/	
  
h"p://www.flickr.com/photos/andykirk/412600169/sizes/l/	
  
h"p://www.flickr.com/photos/jeff-­‐anderson/4385042770/sizes/l/	
  
h"p://www.flickr.com/photos/sgis/6532363/sizes/o/	
  
h"p://www.flickr.com/photos/whatbe"erNme/405735418/sizes/l/	
  
h"p://www.flickr.com/photos/rachubarama/2709346242/sizes/l/	
  
h"p://www.flickr.com/photos/femto-­‐photography/4604878864/sizes/o/	
  
h"p://www.flickr.com/photos/pixx0ne/5689978130/sizes/l/	
  
h"p://www.flickr.com/photos/ruth_w/8432567657/sizes/l/	
  
h"p://www.flickr.com/photos/wesley_lelieveld/8571911541/sizes/l/	
  
h"p://www.flickr.com/photos/lifeasart/242208550/sizes/l/	
  
h"p://www.flickr.com/photos/mrsenil/2219108948/sizes/l/	
  
h"p://www.flickr.com/photos/crisNc/2773883011/sizes/l/	
  
h"p://www.flickr.com/photos/ma"blaze/4491948497/sizes/l/	
  
h"p://www.flickr.com/photos/kenNsh/43788618/sizes/o/	
  
h"p://www.flickr.com/photos/dtanist/10809534755/sizes/l/	
  
h"p://www.flickr.com/photos/jarodcarruthers/10372829184/sizes/l/	
  

Más contenido relacionado

Destacado

パーソナライズニュースを支えるML業務のまわしかた@Yahoo! JAPAN
パーソナライズニュースを支えるML業務のまわしかた@Yahoo! JAPANパーソナライズニュースを支えるML業務のまわしかた@Yahoo! JAPAN
パーソナライズニュースを支えるML業務のまわしかた@Yahoo! JAPAN
Yahoo!デベロッパーネットワーク
 

Destacado (6)

Introduction to Blueflood at Berlin Buzzwords 2013
Introduction to Blueflood at Berlin Buzzwords 2013Introduction to Blueflood at Berlin Buzzwords 2013
Introduction to Blueflood at Berlin Buzzwords 2013
 
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
 
How To (Not) Open Source - Javazone, Oslo 2014
How To (Not) Open Source - Javazone, Oslo 2014How To (Not) Open Source - Javazone, Oslo 2014
How To (Not) Open Source - Javazone, Oslo 2014
 
Blueflood: Open Source Metrics Processing at CassandraEU 2013
Blueflood: Open Source Metrics Processing at CassandraEU 2013Blueflood: Open Source Metrics Processing at CassandraEU 2013
Blueflood: Open Source Metrics Processing at CassandraEU 2013
 
Hoe schrijf ik een persbericht
Hoe schrijf ik een persberichtHoe schrijf ik een persbericht
Hoe schrijf ik een persbericht
 
パーソナライズニュースを支えるML業務のまわしかた@Yahoo! JAPAN
パーソナライズニュースを支えるML業務のまわしかた@Yahoo! JAPANパーソナライズニュースを支えるML業務のまわしかた@Yahoo! JAPAN
パーソナライズニュースを支えるML業務のまわしかた@Yahoo! JAPAN
 

Similar a Measure All the Things! - Austin Data Day 2014

Similar a Measure All the Things! - Austin Data Day 2014 (20)

Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-Ari
 
Going Cloud First at the FT
Going Cloud First at the FTGoing Cloud First at the FT
Going Cloud First at the FT
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Rental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesRental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean Downes
 
The Central Hub: Defining the Data Lake
The Central Hub: Defining the Data LakeThe Central Hub: Defining the Data Lake
The Central Hub: Defining the Data Lake
 
BigData Meets the Federal Data Center
BigData Meets the Federal Data CenterBigData Meets the Federal Data Center
BigData Meets the Federal Data Center
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 
SEO for Large/Enterprise Websites - Data & Tech Side
SEO for Large/Enterprise Websites - Data & Tech SideSEO for Large/Enterprise Websites - Data & Tech Side
SEO for Large/Enterprise Websites - Data & Tech Side
 
What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)
 
Bigdata notes
Bigdata notesBigdata notes
Bigdata notes
 
Using big data_to_your_advantage
Using big data_to_your_advantageUsing big data_to_your_advantage
Using big data_to_your_advantage
 
Big Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudBig Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS Cloud
 
Clouds: All fluff and no substance?
Clouds: All fluff and no substance?Clouds: All fluff and no substance?
Clouds: All fluff and no substance?
 
IoT underthe hood
IoT underthe hoodIoT underthe hood
IoT underthe hood
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Introduction: Real-Time Analytics on Data in Motion
Introduction: Real-Time Analytics on Data in MotionIntroduction: Real-Time Analytics on Data in Motion
Introduction: Real-Time Analytics on Data in Motion
 
Inextricably linked: reproducibility and productivity in data science and AI
Inextricably linked: reproducibility and productivity in data science and AIInextricably linked: reproducibility and productivity in data science and AI
Inextricably linked: reproducibility and productivity in data science and AI
 
Ingesting click events for analytics
Ingesting click events for analyticsIngesting click events for analytics
Ingesting click events for analytics
 

Más de gdusbabek

Rackspace Cloud Monitoring - Strata NYC
Rackspace Cloud Monitoring - Strata NYCRackspace Cloud Monitoring - Strata NYC
Rackspace Cloud Monitoring - Strata NYC
gdusbabek
 
Cassandra Codebase 2011
Cassandra Codebase 2011Cassandra Codebase 2011
Cassandra Codebase 2011
gdusbabek
 

Más de gdusbabek (10)

Rackspace Cloud Monitoring - Strata NYC
Rackspace Cloud Monitoring - Strata NYCRackspace Cloud Monitoring - Strata NYC
Rackspace Cloud Monitoring - Strata NYC
 
Austin cassandra meetup
Austin cassandra meetupAustin cassandra meetup
Austin cassandra meetup
 
How Rackspace Cloud Monitoring uses Cassandra
How Rackspace Cloud Monitoring uses CassandraHow Rackspace Cloud Monitoring uses Cassandra
How Rackspace Cloud Monitoring uses Cassandra
 
Breaking the Relational Headlock: A Survey of NoSQL Datastores
Breaking the Relational Headlock: A Survey of NoSQL DatastoresBreaking the Relational Headlock: A Survey of NoSQL Datastores
Breaking the Relational Headlock: A Survey of NoSQL Datastores
 
Building Rackspace Cloud Monitoring
Building Rackspace Cloud MonitoringBuilding Rackspace Cloud Monitoring
Building Rackspace Cloud Monitoring
 
Cassandra Codebase 2011
Cassandra Codebase 2011Cassandra Codebase 2011
Cassandra Codebase 2011
 
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column FamiliesData Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
 
Getting to Know the Cassandra Codebase
Getting to Know the Cassandra CodebaseGetting to Know the Cassandra Codebase
Getting to Know the Cassandra Codebase
 
Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)
 
Cassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUGCassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUG
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Measure All the Things! - Austin Data Day 2014