Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Events & Metrics: The Lifeblood of Webops
1. Events & Metrics
The Lifeblood Of Webops
Alexis Lê-Quôc (Product Guy) at Datadog
NYCBUG
July 6th, 2011
2. I <3 BSD
‣OpenBSD user since 2.8 (pf)
‣Love the documentation
‣m0n0wall/pfSense
‣ZFS-envy
3. What I’m going to talk about
‣Briefly we do and for whom
‣Where we started
‣The kind of data we deal with
‣How it fits altogether
‣A few things we learned along the way
‣Q+A
4.
5. SaaS Platform for Dev & Ops
‣Aggregation
‣Correlation
‣Collaboration
What we do?
7. The Mess
Usage Analytics Too many data streams,
too many silos
IAAS / PAAS
Issue Resolution
t
ics
Servers and Devices
s igh
ices
tric ins
metr
g
e Too many choices to
billin
m m
cho
et
ric s make, too often
s
?!? change
Dev team
changes !? s
ic choices
metr
Ops team Applications
s
t ric ch
an
me ts ge
me
even Only getting worse as
ev en ts s
tri
ad
s + fe
ice edb SaaS Silos multiply
cs
vic
o ack
ch
e
me
s
s
tric
choice
tri
me
cs
Cap. Planning SDLC support
Monitoring
Hosting Asset Mgmt
CDNs
Separate Dev and Ops
teams, looking at separate
data streams
14. Welcome developers Context Matters
‣Graphite ‣Ganglia Event API
‣statsd
Large Datasets Data Exploration
‣OpenTSDB ‣d3, protovis
TRENDS
Visible through Datadog and others
20. Atomicity Basically
Concistency Available
Isolation Soft-state
Durability Eventual
consistency
e.g. SQL DBs
e.g. DNS
CLASSICS
http://en.wikipedia.org/wiki/Eventual_consistency
21. Data
Intensive
Real
Time
e.g. real-time web
NEW COMER
Brian Cantrill: http://dtrace.org/resources/bmc/DIRT.pdf
22. Aggregation
Constant data influx
Large data sets
Correlation
On-demand visualization
Background data analysis
Collaboration
Real-time updates
On-the-fly data analysis
23. Aggregation
SE
Constant data influx
BA
Large data sets
Correlation
On-demand visualization
Background data analysis
Collaboration
Real-time updates
On-the-fly data analysis
24. Aggregation
SE
T
Constant data influx
IR
BA
D
Large data sets
Correlation
On-demand visualization
Background data analysis
Collaboration
Real-time updates
On-the-fly data analysis
25. Aggregation
SE
T
Constant data influx
IR
BA
D
Large data sets
Correlation
SE
On-demand visualization
BA
Background data analysis
Collaboration
Real-time updates
On-the-fly data analysis
26. Aggregation
SE
T
Constant data influx
IR
BA
D
Large data sets
Correlation
SE
On-demand visualization
BA
Background data analysis
Collaboration
T
Real-time updates
IR
D
On-the-fly data analysis
27. Aggregation
SE
T
Constant data influx
IR
BA
D
Large data sets
Correlation
SE
On-demand visualization
BA
Background data analysis
Collaboration
T
Real-time updates
IR
D
On-the-fly data analysis
Datadog = DIRT + BASE + a tiny bit of ACID
28. How It All Fits Together
http://www.flickr.com/photos/tom-margie/1253798184/
38. Compute Network
Fast Fast
Inelastic Localized
Storage
Fast
Centralized
Redundant
ON-PREMISE TRAITS
http://www.flickr.com/photos/theplanetdotcom/4879419788/sizes/l/in/photostream/
39. Compute Network
Fast Fast
Inelastic Localized
Storage
Fast Management
Centralized People-based
Redundant Full access
ON-PREMISE TRAITS
http://www.flickr.com/photos/theplanetdotcom/4879419788/sizes/l/in/photostream/
50. Latency
BASE
Amazon S3
BASE
Apache Cassandra
ACID
PostgreSQL
DIRT
Redis
Capacity
Storage
51. Latency
BASE
y
nc
Amazon S3
te
La
t
BASE
pu
y
gh
er
Apache Cassandra
ou
ACID tt
hr
Ji
dt
PostgreSQL
i te
Lim
DIRT
y
or
em
Redis
Capacity
m
w
Lo
Storage
65. Network Block Storage
Is The Dark Side
Bait For Enterprise
Customers
Hard Problem For
Cloud Providers
66. Don’t rely on networked block storage
Small data sets only if you have to
Don’t trust data-at-rest
Copy, replicate, back up
Do use S3 if you can
Object semantics a limitation
Slow but durable
Some Do’s And Don’t
68. “Performance”
Scale up Shard
ACID
Nodes
BASE DIRT Add more
Nodes Nodes
Number
Compute
69. Don’t rely on scale-ups
Low memory a hard limit for DBs
Noisy neighbors
Individual performance poor and jittery
Scale out
First scale up
Then Shard
Parallelize across machines
Vector-processing via GPUs
Some Do’s And Don’t