Más contenido relacionado La actualidad más candente (20) Similar a Metrics-Driven Engineering (20) Metrics-Driven Engineering3. How many new visits?
How many listings created?
How many registrations?
How do people use Etsy?
How many convos sent?
How many purchases?
How many new shops?
4. Search indexing?
How fast are pages generating?
Async tasks currently in queue?
What is the application doing?
Developer API auth and rate limiting?
Images resized and stored?
Error and warning rates?
5. Replication slave lag?
Memcache hits/misses?
Available connections?
Are the servers in good shape ?
Database queries per second?
Total outgoing bandwidth?
CPU, Memory, I/O?
12. $314 Million GMS 2010
$180 Million GMS 2009
$87 Million GMS 2008
$26 Million GMS 2007
credit: pentarux (flickr)
13. 25 Million Unique Visitors
1 Billion page views per month
credit: pentarux (flickr)
23. $cfg = array(
'checkout' => array('enabled' => 'on'),
'homepage' => array('enabled' => 'on'),
'profiles' => array('enabled' => 'on'),
'new_search' => array('enabled' => 'off'),
);
Config Flags
Enable and disable features quickly
24. $cfg = array(
'checkout' => array('enabled' => 'on'),
'homepage' => array('enabled' => 'on'),
'profiles' => array('enabled' => 'on'),
'new_search' => array('enabled' => 'off'),
);
Config Flags
Enable and disable features quickly
Plus “admin-only,” percentage ramp-up, A/B testing,
whitelists, blacklists, etc...
36. A: Well, the Ops team manages the network, racks
the servers, installed the monitoring tools, wears
the pagers, blah, blah, blah...
38. Logging
Graphing
OPS ENG
Trending
Alerting
43. Cacti (network, SNMP)
Ganglia (machines)
Graphite (application)
Splunk (log analysis, nightly reports)
Nagios (alerting)
Logging
Logster
StatsD
47. Graphite
Single-instance
Create new metrics on-the-fly
Customize via URLs and display functions
52. web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
failed. Reason: wrong password for ...
53. web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
failed. Reason: wrong password for ...
54. web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
failed. Reason: wrong password for ...
55. web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
failed. Reason: wrong password for ...
56. web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
failed. Reason: wrong password for ...
57. web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
failed. Reason: wrong password for ...
59. LogFormat %{True-Client-IP}i %l %t "%r
" %>s %b "%{Referer}i"
"%{User-Agent}i"
%{etsy_shop_id}n %{etsy_uaid}n %V
%{etsy_ab_selections}n
%{etsy_request_uuid}n
%{etsy_api_consumer_key}n
%{etsy_api_method_name}n
%{php_memory_usage_bytes}n
%{php_time_microsec}n %D" combined
61. LogFormat %{True-Client-IP}i %l %t "%r
" %>s %b "%{Referer}i"
"%{User-Agent}i"
%{etsy_shop_id}n %{etsy_uaid}n %V
%{etsy_ab_selections}n
%{etsy_request_uuid}n
%{etsy_api_consumer_key}n
%{etsy_api_method_name}n
%{php_memory_usage_bytes}n
%{php_time_microsec}n %D" combined
62. LogFormat %{True-Client-IP}i %l %t "%r
" %>s %b "%{Referer}i"
"%{User-Agent}i"
%{etsy_shop_id}n %{etsy_uaid}n %V
%{etsy_ab_selections}n
%{etsy_request_uuid}n
%{etsy_api_consumer_key}n
%{etsy_api_method_name}n
%{php_memory_usage_bytes}n
%{php_time_microsec}n %D" combined
63. LogFormat %{True-Client-IP}i %l %t "%r
" %>s %b "%{Referer}i"
"%{User-Agent}i"
%{etsy_shop_id}n %{etsy_uaid}n %V
%{etsy_ab_selections}n
%{etsy_request_uuid}n
%{etsy_api_consumer_key}n
%{etsy_api_method_name}n
%{php_memory_usage_bytes}n
%{php_time_microsec}n %D" combined
65. web0001 [04:28:54 2011] [error] [client 10.101.x.x] Help me, Rhonda.
web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!
web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!
web0001 [04:28:54 2011] [error] [client 10.101.x.x] Heeeeeeellllllllllllllppppp!
web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!
web0001 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!
web0201 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!
web0034 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo
web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!
web1101 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!
web0201 [04:28:54 2011] [error] [client 10.101.x.x] You've been eaten by a grue.
web0055 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!!
web0002 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is falling.
web0089 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!
web0020 [04:28:54 2011] [error] [client 10.101.x.x] Sky is falling.
web1101 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!
web0055 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!
web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo
web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!
web0034 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!
web0087 [04:28:54 2011] [fatal] [client 10.101.x.x] Sky is falling.
web0002 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!
web0201 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!
web0077 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!
web0355 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo
web0052 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!
web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!
web0003 [04:28:54 2011] [error] [client 10.101.x.x] You've been eaten by a grue.
web0066 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!!
67. Logster
Run by cron
Keeps a cursor on your log file
Aggregate lines anyway you want
Output to Ganglia or Graphite
Simple parsers
github.com/etsy
68. web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
failed. Reason: wrong password for ...
70. if (fields['log_level'] == “fatal”):
self.fatals += 1
elif (fields['log_level'] == “error”):
self.errors += 1
elif (fields['log_level'] == “warning”):
self.warnings += 1
...
71. MetricObject("fatals",
(self.fatals / self.duration), "per sec")
MetricObject("errors",
(self.errors / self.duration), "per sec")
MetricObject("warning",
(self.warnings / self.duration), "per sec")
74. StatsD
Network daemon (node.js)
Accepts data over UDP
Flushes to Graphite every 10 sec
One-line of code
github.com/etsy
85. http://graphite/render?
from=-1hours&width=600&height=200
&target=webs.errorLog.warning&rawData=1
webs.errorLog.warning,1318444930,1318448530,60|
5.0,1.0,3.0,1.0,0.0,9.0,0.0,1.0,3.0,2.0,1.0,6.0,2.0,6.0,3.0,6.0,4.0,4.0,2.0,
1.0,1.0,8.0,2.0,3.0,6.0,3.0,5.0,3.0,0.0,4.0,6.0,2.0,0.0,2.0,0.0,4.0,0.0,3.0,
1.0,3.0,4.0,2.0,10.0,3.0,0.0,6.0,0.0,4.0,2.0,5.0,18.0,1.0,1.0,2.0,1.0,8.0,5.
0,1.0,1.0,None
92. Kind of Hard :-/
<a href="http://graphite.etsycorp.com/render?from=-1hours&width=800&height=600&title=File+or
+Script+Not+Found&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite
%28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production
%29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite
%28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,
%23ff0000,%23006633,%23cc6600">
<img src="http://graphite.etsycorp.com/render?
from=-1hours&width=280&height=220&title=File+or+Script+Not
+Found&hideLegend=1&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite
%28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production
%29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite
%28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,
%23ff0000,%23006633,%23cc6600">
</a>
93. Super Easy!
$g = new Graphite($time);
$g->setTitle('File Not Found');
$g->addMetric('webs.errorLog.notExist', '#00cc00');
echo $g->getDashboardHTML(280, 220);