SlideShare una empresa de Scribd logo
1 de 131
Operational
Efficiency
Hacks
                          John Allspaw
          Operations Engineering, Flickr
who am I?
Manage the Flickr Operations group
Wrote a geeky book:
“Efficiencies”
“Efficiencies”
   Doing more with the robots you’ve got
“Efficiencies”
   Doing more with the robots you’ve got
   Doing more with the humans you’ve got
Some optimization
    “rules”
Some optimization
         “rules”

-   Don’t rely on being able to tweak anything.
Some optimization
         “rules”

-   Don’t rely on being able to tweak anything.
-   Don’t waste too much time tuning when
    you have no evidence it’ll matter.
Optimization “rules”
performance tuning gains




                                 time spent tuning
Optimization “rules”




“We should forget about small efficiencies,
say about 97% of the time: premature
optimization is the root of all evil.”
                         Knuth, (or Hoare)
however...
Optimization “rules”
Optimization “rules”

That doesn’t give us an excuse to be lazy and
inefficient.
Optimization “rules”

That doesn’t give us an excuse to be lazy and
inefficient.
Optimization “rules”

That doesn’t give us an excuse to be lazy and
inefficient.


We lean on the experience of people in the
community for evidence that tuning(s) might
be a worthwhile thing to do.
Optimization “rules”

“Yet we should not pass up our
opportunities in that critical 3 percent.”
                       Knuth, (or Hoare)
So...
          stop somewhere
               in here




                           OMG
obvious
                           I'm wasting
tuning
                           !@#$ time
wins
                           for no reason
Our Context
Our Context

-   24 TB of MySQL data
Our Context

-   24 TB of MySQL data
-   32k/sec of MySQL writes
Our Context

-   24 TB of MySQL data
-   32k/sec of MySQL writes
-   120k/sec of MySQL reads
Our Context

-   24 TB of MySQL data
-   32k/sec of MySQL writes
-   120k/sec of MySQL reads
-   6 PB of photos
Our Context

-   24 TB of MySQL data
-   32k/sec of MySQL writes
-   120k/sec of MySQL reads
-   6 PB of photos
-   10TB storage eaten per day
Our Context

-   24 TB of MySQL data
-   32k/sec of MySQL writes
-   120k/sec of MySQL reads
-   6 PB of photos
-   10TB storage eaten per day
-   15,362 service monitors (alerts)
Infrastructure Hacks
Infrastructure Hacks

-   Examples of what changing software can do
          (plain old-fashioned performance tuning)
Infrastructure Hacks

-   Examples of what changing software can do
          (plain old-fashioned performance tuning)
-   Examples of what changing hardware can do
                              (yay for Mr. Moore!)
Leaning on compilers
(synthetic PHP benchmarks, not real-world)




(http://sebastian-bergmann.de/archives/634-PHP-GCC-
                  ICC-Benchmark.html)
PHP (real-world)




php 4.4.8 to php 5.2.8 migration
Can now handle more with less




same
taste,
 less
filling
Image Processing
Image Processing


-   2004, Flickr was using ImageMagick for
    image processing (version 6.1.9)
Image Processing


-   2004, Flickr was using ImageMagick for
    image processing (version 6.1.9)
-   Changed to GraphicsMagick, about 15%
    faster at the time (version 1.1.5)
Image Processing


-   2004, Flickr was using ImageMagick for
    image processing (version 6.1.9)
-   Changed to GraphicsMagick, about 15%
    faster at the time (version 1.1.5)
-   Only need a subset of ImageMagick features
    anyway for our purposes
Image Processing

-   OpenMP support
    (http://en.wikipedia.org/wiki/Openmp)
    - Allows parallelization of processing jobs,
       using multiple cores working on the same
       image
    - Some algorithms have more parallelization
       than others
Image Processing
-   Test script
     -   7 large-ish DSLR photos
     -   Cascade resizing each to 6 smaller sizes,
         semi-typical for Flickr’s workload
     -   Each resize processed serially
Image Processing
   compiler differences




(GM version 1.1.14, non-OpenMP)
Image Processing
          OpenMP differences


                    OpenMP
                    advantage




(gcc 4.1.2, on quad core Xeon L5335 @ 2.00GHz)
Image Processing
   CPU differences
Diagonal Scaling
Diagonal Scaling
-   Vertically scaling your already horizontally-
    scaled nodes
Diagonal Scaling
-   Vertically scaling your already horizontally-
    scaled nodes
-   a.k.a. “tech refresh”
Diagonal Scaling
-   Vertically scaling your already horizontally-
    scaled nodes
-   a.k.a. “tech refresh”
-   a.k.a. “Moore’s Law Surfing”
Diagonal Scaling
Diagonal Scaling
              67 “old” webservers with 18 “new” :
We replaced
Diagonal Scaling
              67 “old” webservers with 18 “new” :
We replaced


                CPUs         RAM         drives       total power (W)
servers                                                 @60% peak
               per server   per server   per server

   67              2                                    8763.6
                              4GB        1x80GB

   18              8                                    2332.8
                             4GB         1x146GB
Diagonal Scaling
              67 “old” webservers with 18 “new” :
We replaced


                CPUs         RAM         drives       total power (W)
servers                                                 @60% peak

              ~70% LESS power
               per server   per server   per server

   67           2                                       8763.6
                     4GB    1x80GB
              49U LESS rack space
   18              8                                    2332.8
                             4GB         1x146GB
Diagonal Scaling
Diagonal Scaling
              23 “old” image processing boxes with 8 “new”
We replaced
Diagonal Scaling
              23 “old” image processing boxes with 8 “new”
We replaced


   server        photos/min        rack        total power (W)
                                                 @60% peak



      23             1035           23          3008.4
      8              1120            8          1036.8
Diagonal Scaling
              23 “old” image processing boxes with 8 “new”
We replaced


   server        photos/min        rack        total power (W)
                                                 @60% peak

                  ~75% FASTER
      23            1035         23             3008.4
                  15U LESS rack space
                  65% LESS power 8
       8            1120                        1036.8
Diagonal Scaling
              23 “old” image processing boxes with 8 “new”
We replaced


   server        photos/min        rack        total power (W)

                  ~75% FASTER                    @60% peak



                  15U LESS rack space
      23            1035         23             3008.4
                  65% LESS power 8
       8            1120                        1036.8
Diagonal Scaling
              23 “old” image processing boxes with 8 “new”
We replaced


   server        photos/min        rack        total power (W)

                  ~75% FASTER                    @60% peak



                  15U LESS rack space
      23            1035         23             3008.4
                  65% LESS power 8
       8            1120                        1036.8
Diagonal Scaling
              23 “old” image processing boxes with 8 “new”
We replaced


   server        photos/min        rack        total power (W)

                  ~75% FASTER                    @60% peak



                  15U LESS rack space
      23            1035         23             3008.4
                  65% LESS power 8
       8            1120                        1036.8

        from this
                           to this
What do you do with
old/slow machines?
What do you do with
    old/slow machines?

-   Liquidate
What do you do with
    old/slow machines?

-   Liquidate
-   Re-purpose as dev/staging/etc
What do you do with
    old/slow machines?

-   Liquidate
-   Re-purpose as dev/staging/etc
-   “offline” tasks
Offline Tasks
Offline Tasks
-   Out-of-band/asynchronous queuing and execution
    system, for non-realtime tasks
Offline Tasks
-   Out-of-band/asynchronous queuing and execution
    system, for non-realtime tasks
-   See here:
Offline Tasks
-   Out-of-band/asynchronous queuing and execution
    system, for non-realtime tasks
-   See here:
    http://code.flickr.com/blog/2008/09/26/flickr-engineers-do-it-offline/
Offline Tasks
-   Out-of-band/asynchronous queuing and execution
    system, for non-realtime tasks
-   See here:
    http://code.flickr.com/blog/2008/09/26/flickr-engineers-do-it-offline/

-   See Myles Grant talk about it more here:
Offline Tasks
-   Out-of-band/asynchronous queuing and execution
    system, for non-realtime tasks
-   See here:
    http://code.flickr.com/blog/2008/09/26/flickr-engineers-do-it-offline/

-   See Myles Grant talk about it more here:
    http://en.oreilly.com/velocity2009/public/schedule/detail/7552
Runbook Hacks




“WTF HAPPENED LAST NIGHT?!”
Why?
Why?
As infrastructure grows, try to keep the
Humans:Machines ratio from getting out of
hand
Why?
As infrastructure grows, try to keep the
Humans:Machines ratio from getting out of
hand
Some of the How:
Why?
As infrastructure grows, try to keep the
Humans:Machines ratio from getting out of
hand
Some of the How:
  - teach machines to build themselves
Why?
As infrastructure grows, try to keep the
Humans:Machines ratio from getting out of
hand
Some of the How:
  - teach machines to build themselves
  - teach machines to watch themselves
Why?
As infrastructure grows, try to keep the
Humans:Machines ratio from getting out of
hand
Some of the How:
  - teach machines to build themselves
  - teach machines to watch themselves
  - teach machines to fix themselves
Why?
As infrastructure grows, try to keep the
Humans:Machines ratio from getting out of
hand
Some of the How:
  -   teach machines to build themselves
  -   teach machines to watch themselves
  -   teach machines to fix themselves
  -   reduce MTTR by streamlining
Automated
Infrastructure
Automated
          Infrastructure
-   If there is only one thing you do, automatic
    configuration and deployment management
    should be it.
Automated
              Infrastructure
-   If there is only one thing you do, automatic
    configuration and deployment management
    should be it.
-   See:
     -     Opscode/Chef (http://opscode.com/)

     -     Puppet (http://reductivelabs.com/products/puppet/)

     -     System Imager/Configurator
           (http://wiki.systemimager.org)
Conguration Management
     Codeswarm
Time
Machine time is cheaper than human time.


If a failure results in some commands being
run to ‘fix’ it, make the machines do it.


(i.e., don’t wake people up for stupid things!)
Aggregate Monitoring
Aggregate Monitoring




Don’t care about single nodes, only care about delta
change of metrics/faults
   -   Warn (email) on X % change
   -   Page (wake up) on Y % change
Aggregate Monitoring




Don’t care about single nodes, only care about delta
change of metrics/faults
   -   Warn (email) on X % change
   -   Page (wake up) on Y % change
High and low water marks for some metrics
Self-Healing
Self-Healing
Make service monitoring fix common failure
scenarios, notify us later about it.
Self-Healing
Make service monitoring fix common failure
scenarios, notify us later about it.
Self-Healing
Make service monitoring fix common failure
scenarios, notify us later about it.


Daemons/processes run on machines, will take
corrective action under certain conditions, and
report back with what they did.
Self-Healing
Make service monitoring fix common failure
scenarios, notify us later about it.


Daemons/processes run on machines, will take
corrective action under certain conditions, and
report back with what they did.
Self-Healing
Make service monitoring fix common failure
scenarios, notify us later about it.


Daemons/processes run on machines, will take
corrective action under certain conditions, and
report back with what they did.


Can greatly reduce your mean time to recovery
(MTTR)
Self-Healing
Make service monitoring fix common failure
scenarios, notify us later about it.


Daemons/processes run on machines, will take
corrective action under certain conditions, and
report back with what they did.


Can greatly reduce your mean time to recovery
(MTTR)
Basic Apache Example
Basic Apache Example

1. Webserver not running?
Basic Apache Example

1. Webserver not running?
2. Under certain conditions, try to start it, and
   email that this happened. (I’ll read it
   tomorrow)
Basic Apache Example

1. Webserver not running?
2. Under certain conditions, try to start it, and
   email that this happened. (I’ll read it
   tomorrow)
3. Won’t start? Assume something’s really
   wrong, so don’t keep trying (email that, too)
MySQL Self-Healing
MySQL Self-Healing
Some MySQL Issues “fixed” by the machines
MySQL Self-Healing
Some MySQL Issues “fixed” by the machines
MySQL Self-Healing
   Some MySQL Issues “fixed” by the machines


- Kill long-running SELECT queries (marked safe to kill)
MySQL Self-Healing
   Some MySQL Issues “fixed” by the machines


- Kill long-running SELECT queries (marked safe to kill)
- Queries not safe to kill are marked by the application
   as “NO KILL” in comments
MySQL Self-Healing
   Some MySQL Issues “fixed” by the machines


- Kill long-running SELECT queries (marked safe to kill)
- Queries not safe to kill are marked by the application
   as “NO KILL” in comments
- Run EXPLAIN on killed queries, and report the results
MySQL Self-Healing
   Some MySQL Issues “fixed” by the machines


- Kill long-running SELECT queries (marked safe to kill)
- Queries not safe to kill are marked by the application
   as “NO KILL” in comments
- Run EXPLAIN on killed queries, and report the results
- Keep track of the query types and databases that need
   the most killing, produce a “DBs that Suck” report
MySQL Self-Healing
MySQL Self-Healing
Some MySQL Replication issues “fixed” by the
machines, by error
MySQL Self-Healing
  Some MySQL Replication issues “fixed” by the
  machines, by error
- Skip errors that can safely be skipped and
  restart slave threads
MySQL Self-Healing
  Some MySQL Replication issues “fixed” by the
  machines, by error
- Skip errors that can safely be skipped and
  restart slave threads
- Force refetch of replication binlogs on:
       - 1064 (ER_PARSE_ERROR)
MySQL Self-Healing
  Some MySQL Replication issues “fixed” by the
  machines, by error
- Skip errors that can safely be skipped and
  restart slave threads
- Force refetch of replication binlogs on:
       - 1064 (ER_PARSE_ERROR)
- Re-run queries on:
       - 1205 (ER_LOCK_WAIT_TIMEOUT)
       - 1213 (ER_LOCK_DEADLOCK)
Troubleshooting
Code and Config
 Deploy Logs
Code and Config
 Deploy Logs

1. ESSENTIAL
Code and Config
 Deploy Logs

1. ESSENTIAL
2. MANDATORY
Communications
Communications
•   Internal IRC

     -   For ongoing discussions
     -   Logged, so “infinite” scrollback
Communications
•   Internal IRC

      -   For ongoing discussions
      -   Logged, so “infinite” scrollback
•   IM Bot (built on libyahoo2.sf.net)

      -   For production changes
      -   Broadcasts all to all contacts
      -   Logged, and injected into IRC
      -   IM Status = who is in primary/secondary on-call
Communications
•   Internal IRC

      -   For ongoing discussions
      -   Logged, so “infinite” scrollback
•   IM Bot (built on libyahoo2.sf.net)

      -   For production changes
      -   Broadcasts all to all contacts
      -   Logged, and injected into IRC
      -   IM Status = who is in primary/secondary on-call

•   All of IRC and IM Bot slurped into a search index
when
when   what
when   what

              detailed
               what*
when                 what

                                                        detailed
                                                         what*




*also points to what commands should be used to back out the changes
when                 what

                                                        detailed
                                                         what*



                                          who




*also points to what commands should be used to back out the changes
when                 what

                                                        detailed
                                                         what*



                                          who




*also points to what commands should be used to back out the changes
when                 what

                                                        detailed
                                                         what*



                                          who
     time of last deploy at top of ganglia




*also points to what commands should be used to back out the changes
IM Bot (timestamps help correlation)
IM Bot (timestamps help correlation)
IM Bot (timestamps help correlation)




all IRC, IM
      bot
     into
searchable
  history
Morals of Our Stories
Morals of Our Stories

-   Optimizations can be a Very Good Thing™
Morals of Our Stories

-   Optimizations can be a Very Good Thing™
-   Weigh time spent optimizing against
    expected gains
Morals of Our Stories

-   Optimizations can be a Very Good Thing™
-   Weigh time spent optimizing against
    expected gains
-   Lean on others for how much “expected
    gains” mean for different scenarios
Morals of Our Stories

-   Optimizations can be a Very Good Thing™
-   Weigh time spent optimizing against
    expected gains
-   Lean on others for how much “expected
    gains” mean for different scenarios
-   Plain old-fashioned intuition
Some Wisdom Nuggets


     Jon Prall’s 85 WebOps Rules:
  http://jprall.vox.com/library/post/85-
    operations-rules-to-live-by.html
Questions?

http://www.flickr.com/photos/ebarney/3348965637/
http://www.flickr.com/photos/dgmiller/1606071911/
http://www.flickr.com/photos/dannyboyster/60371673/
http://www.flickr.com/photos/bright/189338394/
http://www.flickr.com/photos/nickwheeleroz/2475011402/
http://www.flickr.com/photos/dramaqueennorma/191063346/
http://www.flickr.com/photos/telstar/2861103147/
http://www.flickr.com/photos/norby/2309046043/
http://www.flickr.com/photos/allysonk/201008992/

Más contenido relacionado

La actualidad más candente

Vault Open Source vs Enterprise v2
Vault Open Source vs Enterprise v2Vault Open Source vs Enterprise v2
Vault Open Source vs Enterprise v2Stenio Ferreira
 
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWSMatthew (정재화)
 
Oracle GoldenGate Demo and Data Integration Concepts
Oracle GoldenGate Demo and Data Integration ConceptsOracle GoldenGate Demo and Data Integration Concepts
Oracle GoldenGate Demo and Data Integration ConceptsFumiko Yamashita
 
[OpenInfra Days Korea 2018] Day 2 - E5: GPU on Kubernetes
[OpenInfra Days Korea 2018] Day 2 - E5: GPU on Kubernetes[OpenInfra Days Korea 2018] Day 2 - E5: GPU on Kubernetes
[OpenInfra Days Korea 2018] Day 2 - E5: GPU on KubernetesOpenStack Korea Community
 
게임을 위한 최적의 AWS DB 서비스 선정 퀘스트 깨기::최유정::AWS Summit Seoul 2018
게임을 위한 최적의 AWS DB 서비스 선정 퀘스트 깨기::최유정::AWS Summit Seoul 2018 게임을 위한 최적의 AWS DB 서비스 선정 퀘스트 깨기::최유정::AWS Summit Seoul 2018
게임을 위한 최적의 AWS DB 서비스 선정 퀘스트 깨기::최유정::AWS Summit Seoul 2018 Amazon Web Services Korea
 
Oracle RAC 19c: Best Practices and Secret Internals
Oracle RAC 19c: Best Practices and Secret InternalsOracle RAC 19c: Best Practices and Secret Internals
Oracle RAC 19c: Best Practices and Secret InternalsAnil Nair
 
2019 의료IT의 미래 : 서울대학병원 정밀의학 데이터 플랫폼 도입 사례 - 김경환 실장, 서울대학병원 / 전제민 교육부문 이사, AWS...
2019 의료IT의 미래 : 서울대학병원 정밀의학 데이터 플랫폼 도입 사례 - 김경환 실장, 서울대학병원 / 전제민 교육부문 이사, AWS...2019 의료IT의 미래 : 서울대학병원 정밀의학 데이터 플랫폼 도입 사례 - 김경환 실장, 서울대학병원 / 전제민 교육부문 이사, AWS...
2019 의료IT의 미래 : 서울대학병원 정밀의학 데이터 플랫폼 도입 사례 - 김경환 실장, 서울대학병원 / 전제민 교육부문 이사, AWS...Amazon Web Services Korea
 
2023 COSCUP - Whats new in PostgreSQL 16
2023 COSCUP - Whats new in PostgreSQL 162023 COSCUP - Whats new in PostgreSQL 16
2023 COSCUP - Whats new in PostgreSQL 16José Lin
 
TF에서 팀 빌딩까지 9개월의 기록 : 성장하는 조직을 만드는 여정
TF에서 팀 빌딩까지 9개월의 기록 : 성장하는 조직을 만드는 여정TF에서 팀 빌딩까지 9개월의 기록 : 성장하는 조직을 만드는 여정
TF에서 팀 빌딩까지 9개월의 기록 : 성장하는 조직을 만드는 여정Seongyun Byeon
 
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교Amazon Web Services Korea
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLJim Mlodgenski
 
Hpe Data Protector troubleshooting guide
Hpe Data Protector troubleshooting guideHpe Data Protector troubleshooting guide
Hpe Data Protector troubleshooting guideAndrey Karpov
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg
 
메이븐파헤치기(김우용)
메이븐파헤치기(김우용)메이븐파헤치기(김우용)
메이븐파헤치기(김우용)우용 김
 
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기AWSKRUG - AWS한국사용자모임
 
Cache in API Gateway
Cache in API GatewayCache in API Gateway
Cache in API GatewayGilWon Oh
 
Mvcc (oracle, innodb, postgres)
Mvcc (oracle, innodb, postgres)Mvcc (oracle, innodb, postgres)
Mvcc (oracle, innodb, postgres)frogd
 
Oracle Client Failover - Under The Hood
Oracle Client Failover - Under The HoodOracle Client Failover - Under The Hood
Oracle Client Failover - Under The HoodLudovico Caldara
 
마이크로서비스 기반 클라우드 아키텍처 구성 모범 사례 - 윤석찬 (AWS 테크에반젤리스트)
마이크로서비스 기반 클라우드 아키텍처 구성 모범 사례 - 윤석찬 (AWS 테크에반젤리스트) 마이크로서비스 기반 클라우드 아키텍처 구성 모범 사례 - 윤석찬 (AWS 테크에반젤리스트)
마이크로서비스 기반 클라우드 아키텍처 구성 모범 사례 - 윤석찬 (AWS 테크에반젤리스트) Amazon Web Services Korea
 

La actualidad más candente (20)

Vault Open Source vs Enterprise v2
Vault Open Source vs Enterprise v2Vault Open Source vs Enterprise v2
Vault Open Source vs Enterprise v2
 
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS
 
Oracle GoldenGate Demo and Data Integration Concepts
Oracle GoldenGate Demo and Data Integration ConceptsOracle GoldenGate Demo and Data Integration Concepts
Oracle GoldenGate Demo and Data Integration Concepts
 
[OpenInfra Days Korea 2018] Day 2 - E5: GPU on Kubernetes
[OpenInfra Days Korea 2018] Day 2 - E5: GPU on Kubernetes[OpenInfra Days Korea 2018] Day 2 - E5: GPU on Kubernetes
[OpenInfra Days Korea 2018] Day 2 - E5: GPU on Kubernetes
 
게임을 위한 최적의 AWS DB 서비스 선정 퀘스트 깨기::최유정::AWS Summit Seoul 2018
게임을 위한 최적의 AWS DB 서비스 선정 퀘스트 깨기::최유정::AWS Summit Seoul 2018 게임을 위한 최적의 AWS DB 서비스 선정 퀘스트 깨기::최유정::AWS Summit Seoul 2018
게임을 위한 최적의 AWS DB 서비스 선정 퀘스트 깨기::최유정::AWS Summit Seoul 2018
 
Oracle RAC 19c: Best Practices and Secret Internals
Oracle RAC 19c: Best Practices and Secret InternalsOracle RAC 19c: Best Practices and Secret Internals
Oracle RAC 19c: Best Practices and Secret Internals
 
2019 의료IT의 미래 : 서울대학병원 정밀의학 데이터 플랫폼 도입 사례 - 김경환 실장, 서울대학병원 / 전제민 교육부문 이사, AWS...
2019 의료IT의 미래 : 서울대학병원 정밀의학 데이터 플랫폼 도입 사례 - 김경환 실장, 서울대학병원 / 전제민 교육부문 이사, AWS...2019 의료IT의 미래 : 서울대학병원 정밀의학 데이터 플랫폼 도입 사례 - 김경환 실장, 서울대학병원 / 전제민 교육부문 이사, AWS...
2019 의료IT의 미래 : 서울대학병원 정밀의학 데이터 플랫폼 도입 사례 - 김경환 실장, 서울대학병원 / 전제민 교육부문 이사, AWS...
 
2023 COSCUP - Whats new in PostgreSQL 16
2023 COSCUP - Whats new in PostgreSQL 162023 COSCUP - Whats new in PostgreSQL 16
2023 COSCUP - Whats new in PostgreSQL 16
 
TF에서 팀 빌딩까지 9개월의 기록 : 성장하는 조직을 만드는 여정
TF에서 팀 빌딩까지 9개월의 기록 : 성장하는 조직을 만드는 여정TF에서 팀 빌딩까지 9개월의 기록 : 성장하는 조직을 만드는 여정
TF에서 팀 빌딩까지 9개월의 기록 : 성장하는 조직을 만드는 여정
 
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
 
Hpe Data Protector troubleshooting guide
Hpe Data Protector troubleshooting guideHpe Data Protector troubleshooting guide
Hpe Data Protector troubleshooting guide
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
 
메이븐파헤치기(김우용)
메이븐파헤치기(김우용)메이븐파헤치기(김우용)
메이븐파헤치기(김우용)
 
153 Oracle dba interview questions
153 Oracle dba interview questions153 Oracle dba interview questions
153 Oracle dba interview questions
 
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
Spark + S3 + R3를 이용한 데이터 분석 시스템 만들기
 
Cache in API Gateway
Cache in API GatewayCache in API Gateway
Cache in API Gateway
 
Mvcc (oracle, innodb, postgres)
Mvcc (oracle, innodb, postgres)Mvcc (oracle, innodb, postgres)
Mvcc (oracle, innodb, postgres)
 
Oracle Client Failover - Under The Hood
Oracle Client Failover - Under The HoodOracle Client Failover - Under The Hood
Oracle Client Failover - Under The Hood
 
마이크로서비스 기반 클라우드 아키텍처 구성 모범 사례 - 윤석찬 (AWS 테크에반젤리스트)
마이크로서비스 기반 클라우드 아키텍처 구성 모범 사례 - 윤석찬 (AWS 테크에반젤리스트) 마이크로서비스 기반 클라우드 아키텍처 구성 모범 사례 - 윤석찬 (AWS 테크에반젤리스트)
마이크로서비스 기반 클라우드 아키텍처 구성 모범 사례 - 윤석찬 (AWS 테크에반젤리스트)
 

Destacado

Awareness, Feedback, Self-regulation
Awareness, Feedback, Self-regulationAwareness, Feedback, Self-regulation
Awareness, Feedback, Self-regulationChristian Glahn
 
Go or No-Go: Operability and Contingency Planning at Etsy.com
Go or No-Go: Operability and Contingency Planning at Etsy.comGo or No-Go: Operability and Contingency Planning at Etsy.com
Go or No-Go: Operability and Contingency Planning at Etsy.comJohn Allspaw
 
Programação semana de extensão 2012 folia
Programação semana de extensão 2012 foliaProgramação semana de extensão 2012 folia
Programação semana de extensão 2012 foliaAdriana Rocha
 
Reconociendo habilidades
Reconociendo habilidadesReconociendo habilidades
Reconociendo habilidadescarorubi1
 
Saludo protocolar al cuerpo diplomático acreditado en panamá
Saludo protocolar al cuerpo diplomático acreditado en panamáSaludo protocolar al cuerpo diplomático acreditado en panamá
Saludo protocolar al cuerpo diplomático acreditado en panamámirepanama
 
Pelleting: The link between practice and engineering
Pelleting: The link between practice and engineeringPelleting: The link between practice and engineering
Pelleting: The link between practice and engineeringMilling and Grain magazine
 
Annual Subsea Meeting Registration Form
Annual Subsea Meeting Registration FormAnnual Subsea Meeting Registration Form
Annual Subsea Meeting Registration FormPaulo de Tarso Molina
 
Strategies for Vacant Properties - John Mills, Camelot Property Management
Strategies for Vacant Properties - John Mills, Camelot Property ManagementStrategies for Vacant Properties - John Mills, Camelot Property Management
Strategies for Vacant Properties - John Mills, Camelot Property Managementfhanley
 
FLOW - Far and Large Offshore Wind (Summary)
FLOW - Far and Large Offshore Wind (Summary)FLOW - Far and Large Offshore Wind (Summary)
FLOW - Far and Large Offshore Wind (Summary)NLandUSA
 
B2b newsletter_april
B2b newsletter_aprilB2b newsletter_april
B2b newsletter_aprilRene Jack
 
La gestión de la elección gd e e mail
La gestión de la elección gd e e mailLa gestión de la elección gd e e mail
La gestión de la elección gd e e mailmsanchezm
 
Aguahidratacion cuso4b
Aguahidratacion cuso4bAguahidratacion cuso4b
Aguahidratacion cuso4bJ M
 

Destacado (20)

Awareness, Feedback, Self-regulation
Awareness, Feedback, Self-regulationAwareness, Feedback, Self-regulation
Awareness, Feedback, Self-regulation
 
Go or No-Go: Operability and Contingency Planning at Etsy.com
Go or No-Go: Operability and Contingency Planning at Etsy.comGo or No-Go: Operability and Contingency Planning at Etsy.com
Go or No-Go: Operability and Contingency Planning at Etsy.com
 
Programação semana de extensão 2012 folia
Programação semana de extensão 2012 foliaProgramação semana de extensão 2012 folia
Programação semana de extensão 2012 folia
 
Yoga ocular
Yoga ocularYoga ocular
Yoga ocular
 
Reconociendo habilidades
Reconociendo habilidadesReconociendo habilidades
Reconociendo habilidades
 
Saludo protocolar al cuerpo diplomático acreditado en panamá
Saludo protocolar al cuerpo diplomático acreditado en panamáSaludo protocolar al cuerpo diplomático acreditado en panamá
Saludo protocolar al cuerpo diplomático acreditado en panamá
 
Pelleting: The link between practice and engineering
Pelleting: The link between practice and engineeringPelleting: The link between practice and engineering
Pelleting: The link between practice and engineering
 
Annual Subsea Meeting Registration Form
Annual Subsea Meeting Registration FormAnnual Subsea Meeting Registration Form
Annual Subsea Meeting Registration Form
 
Strategies for Vacant Properties - John Mills, Camelot Property Management
Strategies for Vacant Properties - John Mills, Camelot Property ManagementStrategies for Vacant Properties - John Mills, Camelot Property Management
Strategies for Vacant Properties - John Mills, Camelot Property Management
 
EBT Health & Safety
EBT Health & SafetyEBT Health & Safety
EBT Health & Safety
 
Pulpo congelado en españa
Pulpo congelado en españaPulpo congelado en españa
Pulpo congelado en españa
 
FLOW - Far and Large Offshore Wind (Summary)
FLOW - Far and Large Offshore Wind (Summary)FLOW - Far and Large Offshore Wind (Summary)
FLOW - Far and Large Offshore Wind (Summary)
 
B2b newsletter_april
B2b newsletter_aprilB2b newsletter_april
B2b newsletter_april
 
Cardiodiagnostico
CardiodiagnosticoCardiodiagnostico
Cardiodiagnostico
 
Cyclus-US Portfolio
Cyclus-US PortfolioCyclus-US Portfolio
Cyclus-US Portfolio
 
La gestión de la elección gd e e mail
La gestión de la elección gd e e mailLa gestión de la elección gd e e mail
La gestión de la elección gd e e mail
 
Aguahidratacion cuso4b
Aguahidratacion cuso4bAguahidratacion cuso4b
Aguahidratacion cuso4b
 
Final.gest.hum.1
Final.gest.hum.1Final.gest.hum.1
Final.gest.hum.1
 
Ba38
Ba38Ba38
Ba38
 
Numero
NumeroNumero
Numero
 

Similar a Operational Efficiency Hacks Web20 Expo2009

Capacity Management for Web Operations
Capacity Management for Web OperationsCapacity Management for Web Operations
Capacity Management for Web OperationsJohn Allspaw
 
CPU Optimizations in the CERN Cloud - February 2016
CPU Optimizations in the CERN Cloud - February 2016CPU Optimizations in the CERN Cloud - February 2016
CPU Optimizations in the CERN Cloud - February 2016Belmiro Moreira
 
How to build a state-of-the-art rails cluster
How to build a state-of-the-art rails clusterHow to build a state-of-the-art rails cluster
How to build a state-of-the-art rails clusterTim Lossen
 
Enterprise Search Summit - Speeding Up Search
Enterprise Search Summit - Speeding Up SearchEnterprise Search Summit - Speeding Up Search
Enterprise Search Summit - Speeding Up SearchAzul Systems Inc.
 
MySQL Group Replication - Ready For Production? (2018-04)
MySQL Group Replication - Ready For Production? (2018-04)MySQL Group Replication - Ready For Production? (2018-04)
MySQL Group Replication - Ready For Production? (2018-04)Kenny Gryp
 
Capacity Management from Flickr
Capacity Management from FlickrCapacity Management from Flickr
Capacity Management from Flickrxlight
 
Capacity Planning For Web Operations Presentation
Capacity Planning For Web Operations PresentationCapacity Planning For Web Operations Presentation
Capacity Planning For Web Operations Presentationjward5519
 
Capacity Planning For Web Operations Presentation
Capacity Planning For Web Operations PresentationCapacity Planning For Web Operations Presentation
Capacity Planning For Web Operations Presentationjward5519
 
Apache Gearpump - Lightweight Real-time Streaming Engine
Apache Gearpump - Lightweight Real-time Streaming EngineApache Gearpump - Lightweight Real-time Streaming Engine
Apache Gearpump - Lightweight Real-time Streaming EngineTianlun Zhang
 
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?ScyllaDB
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on CephCeph Community
 
Cloudcon East Presentation
Cloudcon East PresentationCloudcon East Presentation
Cloudcon East Presentationbr7tt
 
Cloudcon East Presentation
Cloudcon East PresentationCloudcon East Presentation
Cloudcon East Presentationbr7tt
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
Couchbase live 2016
Couchbase live 2016Couchbase live 2016
Couchbase live 2016Pierre Mavro
 
Weakly Supervised Whole Slide Image Analysis Using Cloud Computing
Weakly Supervised Whole Slide Image Analysis Using Cloud ComputingWeakly Supervised Whole Slide Image Analysis Using Cloud Computing
Weakly Supervised Whole Slide Image Analysis Using Cloud ComputingSean Yu
 
Smashing Big Data with AHA Hardware GZIP
Smashing Big Data with AHA Hardware GZIPSmashing Big Data with AHA Hardware GZIP
Smashing Big Data with AHA Hardware GZIPJuan D. Deaton, Ph.D.
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey J On The Beach
 

Similar a Operational Efficiency Hacks Web20 Expo2009 (20)

Capacity Management for Web Operations
Capacity Management for Web OperationsCapacity Management for Web Operations
Capacity Management for Web Operations
 
CPU Optimizations in the CERN Cloud - February 2016
CPU Optimizations in the CERN Cloud - February 2016CPU Optimizations in the CERN Cloud - February 2016
CPU Optimizations in the CERN Cloud - February 2016
 
How to build a state-of-the-art rails cluster
How to build a state-of-the-art rails clusterHow to build a state-of-the-art rails cluster
How to build a state-of-the-art rails cluster
 
Enterprise Search Summit - Speeding Up Search
Enterprise Search Summit - Speeding Up SearchEnterprise Search Summit - Speeding Up Search
Enterprise Search Summit - Speeding Up Search
 
MySQL Group Replication - Ready For Production? (2018-04)
MySQL Group Replication - Ready For Production? (2018-04)MySQL Group Replication - Ready For Production? (2018-04)
MySQL Group Replication - Ready For Production? (2018-04)
 
Capacity Management from Flickr
Capacity Management from FlickrCapacity Management from Flickr
Capacity Management from Flickr
 
Capacity Planning For Web Operations Presentation
Capacity Planning For Web Operations PresentationCapacity Planning For Web Operations Presentation
Capacity Planning For Web Operations Presentation
 
Capacity Planning For Web Operations Presentation
Capacity Planning For Web Operations PresentationCapacity Planning For Web Operations Presentation
Capacity Planning For Web Operations Presentation
 
Apache Gearpump - Lightweight Real-time Streaming Engine
Apache Gearpump - Lightweight Real-time Streaming EngineApache Gearpump - Lightweight Real-time Streaming Engine
Apache Gearpump - Lightweight Real-time Streaming Engine
 
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph
 
Cloudcon East Presentation
Cloudcon East PresentationCloudcon East Presentation
Cloudcon East Presentation
 
Cloudcon East Presentation
Cloudcon East PresentationCloudcon East Presentation
Cloudcon East Presentation
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Couchbase live 2016
Couchbase live 2016Couchbase live 2016
Couchbase live 2016
 
Weakly Supervised Whole Slide Image Analysis Using Cloud Computing
Weakly Supervised Whole Slide Image Analysis Using Cloud ComputingWeakly Supervised Whole Slide Image Analysis Using Cloud Computing
Weakly Supervised Whole Slide Image Analysis Using Cloud Computing
 
Smashing Big Data with AHA Hardware GZIP
Smashing Big Data with AHA Hardware GZIPSmashing Big Data with AHA Hardware GZIP
Smashing Big Data with AHA Hardware GZIP
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey
 
Deep Dive on Amazon EC2
Deep Dive on Amazon EC2Deep Dive on Amazon EC2
Deep Dive on Amazon EC2
 
Deep Dive on Amazon EC2
Deep Dive on Amazon EC2Deep Dive on Amazon EC2
Deep Dive on Amazon EC2
 

Más de John Allspaw

Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...John Allspaw
 
Considerations for Alert Design
Considerations for Alert DesignConsiderations for Alert Design
Considerations for Alert DesignJohn Allspaw
 
Velocity EU 2012 Escalating Scenarios: Outage Handling Pitfalls
Velocity EU 2012 Escalating Scenarios: Outage Handling PitfallsVelocity EU 2012 Escalating Scenarios: Outage Handling Pitfalls
Velocity EU 2012 Escalating Scenarios: Outage Handling PitfallsJohn Allspaw
 
Responding to Outages Maturely
Responding to Outages MaturelyResponding to Outages Maturely
Responding to Outages MaturelyJohn Allspaw
 
Resilient Response In Complex Systems
Resilient Response In Complex SystemsResilient Response In Complex Systems
Resilient Response In Complex SystemsJohn Allspaw
 
Outages, PostMortems, and Human Error
Outages, PostMortems, and Human ErrorOutages, PostMortems, and Human Error
Outages, PostMortems, and Human ErrorJohn Allspaw
 
Anticipation: What Could Possibly Go Wrong?
Anticipation: What Could Possibly Go Wrong?Anticipation: What Could Possibly Go Wrong?
Anticipation: What Could Possibly Go Wrong?John Allspaw
 
Advanced PostMortem Fu and Human Error 101 (Velocity 2011)
Advanced PostMortem Fu and Human Error 101 (Velocity 2011)Advanced PostMortem Fu and Human Error 101 (Velocity 2011)
Advanced PostMortem Fu and Human Error 101 (Velocity 2011)John Allspaw
 
Dev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrDev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrJohn Allspaw
 
Ops Meta-Metrics: The Currency You Pay For Change
Ops Meta-Metrics: The Currency You Pay For ChangeOps Meta-Metrics: The Currency You Pay For Change
Ops Meta-Metrics: The Currency You Pay For ChangeJohn Allspaw
 
Ops Meta-Metrics: The Currency You Pay For Change
Ops Meta-Metrics: The Currency You Pay For ChangeOps Meta-Metrics: The Currency You Pay For Change
Ops Meta-Metrics: The Currency You Pay For ChangeJohn Allspaw
 
Capacity Planning For LAMP
Capacity Planning For LAMPCapacity Planning For LAMP
Capacity Planning For LAMPJohn Allspaw
 
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at FlickrJohn Allspaw
 
Capacity Planning for Web Operations - Web20 Expo 2008
Capacity Planning for Web Operations - Web20 Expo 2008Capacity Planning for Web Operations - Web20 Expo 2008
Capacity Planning for Web Operations - Web20 Expo 2008John Allspaw
 

Más de John Allspaw (14)

Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...
 
Considerations for Alert Design
Considerations for Alert DesignConsiderations for Alert Design
Considerations for Alert Design
 
Velocity EU 2012 Escalating Scenarios: Outage Handling Pitfalls
Velocity EU 2012 Escalating Scenarios: Outage Handling PitfallsVelocity EU 2012 Escalating Scenarios: Outage Handling Pitfalls
Velocity EU 2012 Escalating Scenarios: Outage Handling Pitfalls
 
Responding to Outages Maturely
Responding to Outages MaturelyResponding to Outages Maturely
Responding to Outages Maturely
 
Resilient Response In Complex Systems
Resilient Response In Complex SystemsResilient Response In Complex Systems
Resilient Response In Complex Systems
 
Outages, PostMortems, and Human Error
Outages, PostMortems, and Human ErrorOutages, PostMortems, and Human Error
Outages, PostMortems, and Human Error
 
Anticipation: What Could Possibly Go Wrong?
Anticipation: What Could Possibly Go Wrong?Anticipation: What Could Possibly Go Wrong?
Anticipation: What Could Possibly Go Wrong?
 
Advanced PostMortem Fu and Human Error 101 (Velocity 2011)
Advanced PostMortem Fu and Human Error 101 (Velocity 2011)Advanced PostMortem Fu and Human Error 101 (Velocity 2011)
Advanced PostMortem Fu and Human Error 101 (Velocity 2011)
 
Dev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrDev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and Flickr
 
Ops Meta-Metrics: The Currency You Pay For Change
Ops Meta-Metrics: The Currency You Pay For ChangeOps Meta-Metrics: The Currency You Pay For Change
Ops Meta-Metrics: The Currency You Pay For Change
 
Ops Meta-Metrics: The Currency You Pay For Change
Ops Meta-Metrics: The Currency You Pay For ChangeOps Meta-Metrics: The Currency You Pay For Change
Ops Meta-Metrics: The Currency You Pay For Change
 
Capacity Planning For LAMP
Capacity Planning For LAMPCapacity Planning For LAMP
Capacity Planning For LAMP
 
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
 
Capacity Planning for Web Operations - Web20 Expo 2008
Capacity Planning for Web Operations - Web20 Expo 2008Capacity Planning for Web Operations - Web20 Expo 2008
Capacity Planning for Web Operations - Web20 Expo 2008
 

Último

Lucknow 💋 Escort Service in Lucknow (Adult Only) 8923113531 Escort Service 2...
Lucknow 💋 Escort Service in Lucknow  (Adult Only) 8923113531 Escort Service 2...Lucknow 💋 Escort Service in Lucknow  (Adult Only) 8923113531 Escort Service 2...
Lucknow 💋 Escort Service in Lucknow (Adult Only) 8923113531 Escort Service 2...anilsa9823
 
Lucknow 💋 Cheap Call Girls In Lucknow Finest Escorts Service 8923113531 Avail...
Lucknow 💋 Cheap Call Girls In Lucknow Finest Escorts Service 8923113531 Avail...Lucknow 💋 Cheap Call Girls In Lucknow Finest Escorts Service 8923113531 Avail...
Lucknow 💋 Cheap Call Girls In Lucknow Finest Escorts Service 8923113531 Avail...anilsa9823
 
Call girls in Kanpur - 9761072362 with room service
Call girls in Kanpur - 9761072362 with room serviceCall girls in Kanpur - 9761072362 with room service
Call girls in Kanpur - 9761072362 with room servicediscovermytutordmt
 
Lucknow 💋 Call Girl in Lucknow | Whatsapp No 8923113531 VIP Escorts Service A...
Lucknow 💋 Call Girl in Lucknow | Whatsapp No 8923113531 VIP Escorts Service A...Lucknow 💋 Call Girl in Lucknow | Whatsapp No 8923113531 VIP Escorts Service A...
Lucknow 💋 Call Girl in Lucknow | Whatsapp No 8923113531 VIP Escorts Service A...anilsa9823
 
Lucknow 💋 best call girls in Lucknow (Adult Only) 8923113531 Escort Service ...
Lucknow 💋 best call girls in Lucknow  (Adult Only) 8923113531 Escort Service ...Lucknow 💋 best call girls in Lucknow  (Adult Only) 8923113531 Escort Service ...
Lucknow 💋 best call girls in Lucknow (Adult Only) 8923113531 Escort Service ...anilsa9823
 
Lucknow 💋 Escorts Service Lucknow Phone No 8923113531 Elite Escort Service Av...
Lucknow 💋 Escorts Service Lucknow Phone No 8923113531 Elite Escort Service Av...Lucknow 💋 Escorts Service Lucknow Phone No 8923113531 Elite Escort Service Av...
Lucknow 💋 Escorts Service Lucknow Phone No 8923113531 Elite Escort Service Av...anilsa9823
 
Jeremy Casson - How Painstaking Restoration Has Revealed the Beauty of an Imp...
Jeremy Casson - How Painstaking Restoration Has Revealed the Beauty of an Imp...Jeremy Casson - How Painstaking Restoration Has Revealed the Beauty of an Imp...
Jeremy Casson - How Painstaking Restoration Has Revealed the Beauty of an Imp...Jeremy Casson
 
Indira Nagar Lucknow #Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payme...
Indira Nagar Lucknow #Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payme...Indira Nagar Lucknow #Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payme...
Indira Nagar Lucknow #Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payme...akbard9823
 
exhuma plot and synopsis from the exhuma movie.pptx
exhuma plot and synopsis from the exhuma movie.pptxexhuma plot and synopsis from the exhuma movie.pptx
exhuma plot and synopsis from the exhuma movie.pptxKurikulumPenilaian
 
Jeremy Casson - Top Tips for Pottery Wheel Throwing
Jeremy Casson - Top Tips for Pottery Wheel ThrowingJeremy Casson - Top Tips for Pottery Wheel Throwing
Jeremy Casson - Top Tips for Pottery Wheel ThrowingJeremy Casson
 
RAK Call Girls Service # 971559085003 # Call Girl Service In RAK
RAK Call Girls Service # 971559085003 # Call Girl Service In RAKRAK Call Girls Service # 971559085003 # Call Girl Service In RAK
RAK Call Girls Service # 971559085003 # Call Girl Service In RAKedwardsara83
 
Young⚡Call Girls in Uttam Nagar Delhi >༒9667401043 Escort Service
Young⚡Call Girls in Uttam Nagar Delhi >༒9667401043 Escort ServiceYoung⚡Call Girls in Uttam Nagar Delhi >༒9667401043 Escort Service
Young⚡Call Girls in Uttam Nagar Delhi >༒9667401043 Escort Servicesonnydelhi1992
 
Lucknow 💋 Call Girls Service Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...
Lucknow 💋 Call Girls Service Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...Lucknow 💋 Call Girls Service Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...
Lucknow 💋 Call Girls Service Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...anilsa9823
 
Lucknow 💋 Call Girl in Lucknow 10k @ I'm VIP Independent Escorts Girls 892311...
Lucknow 💋 Call Girl in Lucknow 10k @ I'm VIP Independent Escorts Girls 892311...Lucknow 💋 Call Girl in Lucknow 10k @ I'm VIP Independent Escorts Girls 892311...
Lucknow 💋 Call Girl in Lucknow 10k @ I'm VIP Independent Escorts Girls 892311...anilsa9823
 
Aminabad @ Book Call Girls in Lucknow - 450+ Call Girl Cash Payment 🍵 8923113...
Aminabad @ Book Call Girls in Lucknow - 450+ Call Girl Cash Payment 🍵 8923113...Aminabad @ Book Call Girls in Lucknow - 450+ Call Girl Cash Payment 🍵 8923113...
Aminabad @ Book Call Girls in Lucknow - 450+ Call Girl Cash Payment 🍵 8923113...akbard9823
 
Jeremy Casson - An Architectural and Historical Journey Around Europe
Jeremy Casson - An Architectural and Historical Journey Around EuropeJeremy Casson - An Architectural and Historical Journey Around Europe
Jeremy Casson - An Architectural and Historical Journey Around EuropeJeremy Casson
 
(9711106444 )🫦#Sexy Desi Call Girls Noida Sector 4 Escorts Service Delhi 🫶
(9711106444 )🫦#Sexy Desi Call Girls Noida Sector 4 Escorts Service Delhi 🫶(9711106444 )🫦#Sexy Desi Call Girls Noida Sector 4 Escorts Service Delhi 🫶
(9711106444 )🫦#Sexy Desi Call Girls Noida Sector 4 Escorts Service Delhi 🫶delhimunirka444
 
Turn Lock Take Key Storyboard Daniel Johnson
Turn Lock Take Key Storyboard Daniel JohnsonTurn Lock Take Key Storyboard Daniel Johnson
Turn Lock Take Key Storyboard Daniel Johnsonthephillipta
 
VIP Ramnagar Call Girls, Ramnagar escorts Girls 📞 8617697112
VIP Ramnagar Call Girls, Ramnagar escorts Girls 📞 8617697112VIP Ramnagar Call Girls, Ramnagar escorts Girls 📞 8617697112
VIP Ramnagar Call Girls, Ramnagar escorts Girls 📞 8617697112Nitya salvi
 

Último (20)

Lucknow 💋 Escort Service in Lucknow (Adult Only) 8923113531 Escort Service 2...
Lucknow 💋 Escort Service in Lucknow  (Adult Only) 8923113531 Escort Service 2...Lucknow 💋 Escort Service in Lucknow  (Adult Only) 8923113531 Escort Service 2...
Lucknow 💋 Escort Service in Lucknow (Adult Only) 8923113531 Escort Service 2...
 
Lucknow 💋 Cheap Call Girls In Lucknow Finest Escorts Service 8923113531 Avail...
Lucknow 💋 Cheap Call Girls In Lucknow Finest Escorts Service 8923113531 Avail...Lucknow 💋 Cheap Call Girls In Lucknow Finest Escorts Service 8923113531 Avail...
Lucknow 💋 Cheap Call Girls In Lucknow Finest Escorts Service 8923113531 Avail...
 
Call girls in Kanpur - 9761072362 with room service
Call girls in Kanpur - 9761072362 with room serviceCall girls in Kanpur - 9761072362 with room service
Call girls in Kanpur - 9761072362 with room service
 
Lucknow 💋 Call Girl in Lucknow | Whatsapp No 8923113531 VIP Escorts Service A...
Lucknow 💋 Call Girl in Lucknow | Whatsapp No 8923113531 VIP Escorts Service A...Lucknow 💋 Call Girl in Lucknow | Whatsapp No 8923113531 VIP Escorts Service A...
Lucknow 💋 Call Girl in Lucknow | Whatsapp No 8923113531 VIP Escorts Service A...
 
Lucknow 💋 best call girls in Lucknow (Adult Only) 8923113531 Escort Service ...
Lucknow 💋 best call girls in Lucknow  (Adult Only) 8923113531 Escort Service ...Lucknow 💋 best call girls in Lucknow  (Adult Only) 8923113531 Escort Service ...
Lucknow 💋 best call girls in Lucknow (Adult Only) 8923113531 Escort Service ...
 
Lucknow 💋 Escorts Service Lucknow Phone No 8923113531 Elite Escort Service Av...
Lucknow 💋 Escorts Service Lucknow Phone No 8923113531 Elite Escort Service Av...Lucknow 💋 Escorts Service Lucknow Phone No 8923113531 Elite Escort Service Av...
Lucknow 💋 Escorts Service Lucknow Phone No 8923113531 Elite Escort Service Av...
 
Jeremy Casson - How Painstaking Restoration Has Revealed the Beauty of an Imp...
Jeremy Casson - How Painstaking Restoration Has Revealed the Beauty of an Imp...Jeremy Casson - How Painstaking Restoration Has Revealed the Beauty of an Imp...
Jeremy Casson - How Painstaking Restoration Has Revealed the Beauty of an Imp...
 
Indira Nagar Lucknow #Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payme...
Indira Nagar Lucknow #Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payme...Indira Nagar Lucknow #Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payme...
Indira Nagar Lucknow #Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payme...
 
exhuma plot and synopsis from the exhuma movie.pptx
exhuma plot and synopsis from the exhuma movie.pptxexhuma plot and synopsis from the exhuma movie.pptx
exhuma plot and synopsis from the exhuma movie.pptx
 
Jeremy Casson - Top Tips for Pottery Wheel Throwing
Jeremy Casson - Top Tips for Pottery Wheel ThrowingJeremy Casson - Top Tips for Pottery Wheel Throwing
Jeremy Casson - Top Tips for Pottery Wheel Throwing
 
RAK Call Girls Service # 971559085003 # Call Girl Service In RAK
RAK Call Girls Service # 971559085003 # Call Girl Service In RAKRAK Call Girls Service # 971559085003 # Call Girl Service In RAK
RAK Call Girls Service # 971559085003 # Call Girl Service In RAK
 
RAJKOT CALL GIRL 76313*77252 CALL GIRL IN RAJKOT
RAJKOT CALL GIRL 76313*77252 CALL GIRL IN RAJKOTRAJKOT CALL GIRL 76313*77252 CALL GIRL IN RAJKOT
RAJKOT CALL GIRL 76313*77252 CALL GIRL IN RAJKOT
 
Young⚡Call Girls in Uttam Nagar Delhi >༒9667401043 Escort Service
Young⚡Call Girls in Uttam Nagar Delhi >༒9667401043 Escort ServiceYoung⚡Call Girls in Uttam Nagar Delhi >༒9667401043 Escort Service
Young⚡Call Girls in Uttam Nagar Delhi >༒9667401043 Escort Service
 
Lucknow 💋 Call Girls Service Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...
Lucknow 💋 Call Girls Service Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...Lucknow 💋 Call Girls Service Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...
Lucknow 💋 Call Girls Service Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...
 
Lucknow 💋 Call Girl in Lucknow 10k @ I'm VIP Independent Escorts Girls 892311...
Lucknow 💋 Call Girl in Lucknow 10k @ I'm VIP Independent Escorts Girls 892311...Lucknow 💋 Call Girl in Lucknow 10k @ I'm VIP Independent Escorts Girls 892311...
Lucknow 💋 Call Girl in Lucknow 10k @ I'm VIP Independent Escorts Girls 892311...
 
Aminabad @ Book Call Girls in Lucknow - 450+ Call Girl Cash Payment 🍵 8923113...
Aminabad @ Book Call Girls in Lucknow - 450+ Call Girl Cash Payment 🍵 8923113...Aminabad @ Book Call Girls in Lucknow - 450+ Call Girl Cash Payment 🍵 8923113...
Aminabad @ Book Call Girls in Lucknow - 450+ Call Girl Cash Payment 🍵 8923113...
 
Jeremy Casson - An Architectural and Historical Journey Around Europe
Jeremy Casson - An Architectural and Historical Journey Around EuropeJeremy Casson - An Architectural and Historical Journey Around Europe
Jeremy Casson - An Architectural and Historical Journey Around Europe
 
(9711106444 )🫦#Sexy Desi Call Girls Noida Sector 4 Escorts Service Delhi 🫶
(9711106444 )🫦#Sexy Desi Call Girls Noida Sector 4 Escorts Service Delhi 🫶(9711106444 )🫦#Sexy Desi Call Girls Noida Sector 4 Escorts Service Delhi 🫶
(9711106444 )🫦#Sexy Desi Call Girls Noida Sector 4 Escorts Service Delhi 🫶
 
Turn Lock Take Key Storyboard Daniel Johnson
Turn Lock Take Key Storyboard Daniel JohnsonTurn Lock Take Key Storyboard Daniel Johnson
Turn Lock Take Key Storyboard Daniel Johnson
 
VIP Ramnagar Call Girls, Ramnagar escorts Girls 📞 8617697112
VIP Ramnagar Call Girls, Ramnagar escorts Girls 📞 8617697112VIP Ramnagar Call Girls, Ramnagar escorts Girls 📞 8617697112
VIP Ramnagar Call Girls, Ramnagar escorts Girls 📞 8617697112
 

Operational Efficiency Hacks Web20 Expo2009

  • 1. Operational Efficiency Hacks John Allspaw Operations Engineering, Flickr
  • 2. who am I? Manage the Flickr Operations group Wrote a geeky book:
  • 4. “Efficiencies” Doing more with the robots you’ve got
  • 5. “Efficiencies” Doing more with the robots you’ve got Doing more with the humans you’ve got
  • 6. Some optimization “rules”
  • 7. Some optimization “rules” - Don’t rely on being able to tweak anything.
  • 8. Some optimization “rules” - Don’t rely on being able to tweak anything. - Don’t waste too much time tuning when you have no evidence it’ll matter.
  • 10. Optimization “rules” “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.” Knuth, (or Hoare)
  • 13. Optimization “rules” That doesn’t give us an excuse to be lazy and inefficient.
  • 14. Optimization “rules” That doesn’t give us an excuse to be lazy and inefficient.
  • 15. Optimization “rules” That doesn’t give us an excuse to be lazy and inefficient. We lean on the experience of people in the community for evidence that tuning(s) might be a worthwhile thing to do.
  • 16. Optimization “rules” “Yet we should not pass up our opportunities in that critical 3 percent.” Knuth, (or Hoare)
  • 17. So... stop somewhere in here OMG obvious I'm wasting tuning !@#$ time wins for no reason
  • 19. Our Context - 24 TB of MySQL data
  • 20. Our Context - 24 TB of MySQL data - 32k/sec of MySQL writes
  • 21. Our Context - 24 TB of MySQL data - 32k/sec of MySQL writes - 120k/sec of MySQL reads
  • 22. Our Context - 24 TB of MySQL data - 32k/sec of MySQL writes - 120k/sec of MySQL reads - 6 PB of photos
  • 23. Our Context - 24 TB of MySQL data - 32k/sec of MySQL writes - 120k/sec of MySQL reads - 6 PB of photos - 10TB storage eaten per day
  • 24. Our Context - 24 TB of MySQL data - 32k/sec of MySQL writes - 120k/sec of MySQL reads - 6 PB of photos - 10TB storage eaten per day - 15,362 service monitors (alerts)
  • 26. Infrastructure Hacks - Examples of what changing software can do (plain old-fashioned performance tuning)
  • 27. Infrastructure Hacks - Examples of what changing software can do (plain old-fashioned performance tuning) - Examples of what changing hardware can do (yay for Mr. Moore!)
  • 28. Leaning on compilers (synthetic PHP benchmarks, not real-world) (http://sebastian-bergmann.de/archives/634-PHP-GCC- ICC-Benchmark.html)
  • 29. PHP (real-world) php 4.4.8 to php 5.2.8 migration
  • 30. Can now handle more with less same taste, less filling
  • 32. Image Processing - 2004, Flickr was using ImageMagick for image processing (version 6.1.9)
  • 33. Image Processing - 2004, Flickr was using ImageMagick for image processing (version 6.1.9) - Changed to GraphicsMagick, about 15% faster at the time (version 1.1.5)
  • 34. Image Processing - 2004, Flickr was using ImageMagick for image processing (version 6.1.9) - Changed to GraphicsMagick, about 15% faster at the time (version 1.1.5) - Only need a subset of ImageMagick features anyway for our purposes
  • 35. Image Processing - OpenMP support (http://en.wikipedia.org/wiki/Openmp) - Allows parallelization of processing jobs, using multiple cores working on the same image - Some algorithms have more parallelization than others
  • 36. Image Processing - Test script - 7 large-ish DSLR photos - Cascade resizing each to 6 smaller sizes, semi-typical for Flickr’s workload - Each resize processed serially
  • 37. Image Processing compiler differences (GM version 1.1.14, non-OpenMP)
  • 38. Image Processing OpenMP differences OpenMP advantage (gcc 4.1.2, on quad core Xeon L5335 @ 2.00GHz)
  • 39. Image Processing CPU differences
  • 41. Diagonal Scaling - Vertically scaling your already horizontally- scaled nodes
  • 42. Diagonal Scaling - Vertically scaling your already horizontally- scaled nodes - a.k.a. “tech refresh”
  • 43. Diagonal Scaling - Vertically scaling your already horizontally- scaled nodes - a.k.a. “tech refresh” - a.k.a. “Moore’s Law Surfing”
  • 45. Diagonal Scaling 67 “old” webservers with 18 “new” : We replaced
  • 46. Diagonal Scaling 67 “old” webservers with 18 “new” : We replaced CPUs RAM drives total power (W) servers @60% peak per server per server per server 67 2 8763.6 4GB 1x80GB 18 8 2332.8 4GB 1x146GB
  • 47. Diagonal Scaling 67 “old” webservers with 18 “new” : We replaced CPUs RAM drives total power (W) servers @60% peak ~70% LESS power per server per server per server 67 2 8763.6 4GB 1x80GB 49U LESS rack space 18 8 2332.8 4GB 1x146GB
  • 49. Diagonal Scaling 23 “old” image processing boxes with 8 “new” We replaced
  • 50. Diagonal Scaling 23 “old” image processing boxes with 8 “new” We replaced server photos/min rack total power (W) @60% peak 23 1035 23 3008.4 8 1120 8 1036.8
  • 51. Diagonal Scaling 23 “old” image processing boxes with 8 “new” We replaced server photos/min rack total power (W) @60% peak ~75% FASTER 23 1035 23 3008.4 15U LESS rack space 65% LESS power 8 8 1120 1036.8
  • 52. Diagonal Scaling 23 “old” image processing boxes with 8 “new” We replaced server photos/min rack total power (W) ~75% FASTER @60% peak 15U LESS rack space 23 1035 23 3008.4 65% LESS power 8 8 1120 1036.8
  • 53. Diagonal Scaling 23 “old” image processing boxes with 8 “new” We replaced server photos/min rack total power (W) ~75% FASTER @60% peak 15U LESS rack space 23 1035 23 3008.4 65% LESS power 8 8 1120 1036.8
  • 54. Diagonal Scaling 23 “old” image processing boxes with 8 “new” We replaced server photos/min rack total power (W) ~75% FASTER @60% peak 15U LESS rack space 23 1035 23 3008.4 65% LESS power 8 8 1120 1036.8 from this to this
  • 55. What do you do with old/slow machines?
  • 56. What do you do with old/slow machines? - Liquidate
  • 57. What do you do with old/slow machines? - Liquidate - Re-purpose as dev/staging/etc
  • 58. What do you do with old/slow machines? - Liquidate - Re-purpose as dev/staging/etc - “offline” tasks
  • 60. Offline Tasks - Out-of-band/asynchronous queuing and execution system, for non-realtime tasks
  • 61. Offline Tasks - Out-of-band/asynchronous queuing and execution system, for non-realtime tasks - See here:
  • 62. Offline Tasks - Out-of-band/asynchronous queuing and execution system, for non-realtime tasks - See here: http://code.flickr.com/blog/2008/09/26/flickr-engineers-do-it-offline/
  • 63. Offline Tasks - Out-of-band/asynchronous queuing and execution system, for non-realtime tasks - See here: http://code.flickr.com/blog/2008/09/26/flickr-engineers-do-it-offline/ - See Myles Grant talk about it more here:
  • 64. Offline Tasks - Out-of-band/asynchronous queuing and execution system, for non-realtime tasks - See here: http://code.flickr.com/blog/2008/09/26/flickr-engineers-do-it-offline/ - See Myles Grant talk about it more here: http://en.oreilly.com/velocity2009/public/schedule/detail/7552
  • 65. Runbook Hacks “WTF HAPPENED LAST NIGHT?!”
  • 66. Why?
  • 67. Why? As infrastructure grows, try to keep the Humans:Machines ratio from getting out of hand
  • 68. Why? As infrastructure grows, try to keep the Humans:Machines ratio from getting out of hand Some of the How:
  • 69. Why? As infrastructure grows, try to keep the Humans:Machines ratio from getting out of hand Some of the How: - teach machines to build themselves
  • 70. Why? As infrastructure grows, try to keep the Humans:Machines ratio from getting out of hand Some of the How: - teach machines to build themselves - teach machines to watch themselves
  • 71. Why? As infrastructure grows, try to keep the Humans:Machines ratio from getting out of hand Some of the How: - teach machines to build themselves - teach machines to watch themselves - teach machines to fix themselves
  • 72. Why? As infrastructure grows, try to keep the Humans:Machines ratio from getting out of hand Some of the How: - teach machines to build themselves - teach machines to watch themselves - teach machines to fix themselves - reduce MTTR by streamlining
  • 74. Automated Infrastructure - If there is only one thing you do, automatic configuration and deployment management should be it.
  • 75. Automated Infrastructure - If there is only one thing you do, automatic configuration and deployment management should be it. - See: - Opscode/Chef (http://opscode.com/) - Puppet (http://reductivelabs.com/products/puppet/) - System Imager/Configurator (http://wiki.systemimager.org)
  • 77. Time Machine time is cheaper than human time. If a failure results in some commands being run to ‘fix’ it, make the machines do it. (i.e., don’t wake people up for stupid things!)
  • 79. Aggregate Monitoring Don’t care about single nodes, only care about delta change of metrics/faults - Warn (email) on X % change - Page (wake up) on Y % change
  • 80. Aggregate Monitoring Don’t care about single nodes, only care about delta change of metrics/faults - Warn (email) on X % change - Page (wake up) on Y % change High and low water marks for some metrics
  • 82. Self-Healing Make service monitoring fix common failure scenarios, notify us later about it.
  • 83. Self-Healing Make service monitoring fix common failure scenarios, notify us later about it.
  • 84. Self-Healing Make service monitoring fix common failure scenarios, notify us later about it. Daemons/processes run on machines, will take corrective action under certain conditions, and report back with what they did.
  • 85. Self-Healing Make service monitoring fix common failure scenarios, notify us later about it. Daemons/processes run on machines, will take corrective action under certain conditions, and report back with what they did.
  • 86. Self-Healing Make service monitoring fix common failure scenarios, notify us later about it. Daemons/processes run on machines, will take corrective action under certain conditions, and report back with what they did. Can greatly reduce your mean time to recovery (MTTR)
  • 87. Self-Healing Make service monitoring fix common failure scenarios, notify us later about it. Daemons/processes run on machines, will take corrective action under certain conditions, and report back with what they did. Can greatly reduce your mean time to recovery (MTTR)
  • 89. Basic Apache Example 1. Webserver not running?
  • 90. Basic Apache Example 1. Webserver not running? 2. Under certain conditions, try to start it, and email that this happened. (I’ll read it tomorrow)
  • 91. Basic Apache Example 1. Webserver not running? 2. Under certain conditions, try to start it, and email that this happened. (I’ll read it tomorrow) 3. Won’t start? Assume something’s really wrong, so don’t keep trying (email that, too)
  • 93. MySQL Self-Healing Some MySQL Issues “fixed” by the machines
  • 94. MySQL Self-Healing Some MySQL Issues “fixed” by the machines
  • 95. MySQL Self-Healing Some MySQL Issues “fixed” by the machines - Kill long-running SELECT queries (marked safe to kill)
  • 96. MySQL Self-Healing Some MySQL Issues “fixed” by the machines - Kill long-running SELECT queries (marked safe to kill) - Queries not safe to kill are marked by the application as “NO KILL” in comments
  • 97. MySQL Self-Healing Some MySQL Issues “fixed” by the machines - Kill long-running SELECT queries (marked safe to kill) - Queries not safe to kill are marked by the application as “NO KILL” in comments - Run EXPLAIN on killed queries, and report the results
  • 98. MySQL Self-Healing Some MySQL Issues “fixed” by the machines - Kill long-running SELECT queries (marked safe to kill) - Queries not safe to kill are marked by the application as “NO KILL” in comments - Run EXPLAIN on killed queries, and report the results - Keep track of the query types and databases that need the most killing, produce a “DBs that Suck” report
  • 100. MySQL Self-Healing Some MySQL Replication issues “fixed” by the machines, by error
  • 101. MySQL Self-Healing Some MySQL Replication issues “fixed” by the machines, by error - Skip errors that can safely be skipped and restart slave threads
  • 102. MySQL Self-Healing Some MySQL Replication issues “fixed” by the machines, by error - Skip errors that can safely be skipped and restart slave threads - Force refetch of replication binlogs on: - 1064 (ER_PARSE_ERROR)
  • 103. MySQL Self-Healing Some MySQL Replication issues “fixed” by the machines, by error - Skip errors that can safely be skipped and restart slave threads - Force refetch of replication binlogs on: - 1064 (ER_PARSE_ERROR) - Re-run queries on: - 1205 (ER_LOCK_WAIT_TIMEOUT) - 1213 (ER_LOCK_DEADLOCK)
  • 105. Code and Config Deploy Logs
  • 106. Code and Config Deploy Logs 1. ESSENTIAL
  • 107. Code and Config Deploy Logs 1. ESSENTIAL 2. MANDATORY
  • 109. Communications • Internal IRC - For ongoing discussions - Logged, so “infinite” scrollback
  • 110. Communications • Internal IRC - For ongoing discussions - Logged, so “infinite” scrollback • IM Bot (built on libyahoo2.sf.net) - For production changes - Broadcasts all to all contacts - Logged, and injected into IRC - IM Status = who is in primary/secondary on-call
  • 111. Communications • Internal IRC - For ongoing discussions - Logged, so “infinite” scrollback • IM Bot (built on libyahoo2.sf.net) - For production changes - Broadcasts all to all contacts - Logged, and injected into IRC - IM Status = who is in primary/secondary on-call • All of IRC and IM Bot slurped into a search index
  • 112.
  • 113. when
  • 114. when what
  • 115. when what detailed what*
  • 116. when what detailed what* *also points to what commands should be used to back out the changes
  • 117. when what detailed what* who *also points to what commands should be used to back out the changes
  • 118. when what detailed what* who *also points to what commands should be used to back out the changes
  • 119. when what detailed what* who time of last deploy at top of ganglia *also points to what commands should be used to back out the changes
  • 120.
  • 121.
  • 122. IM Bot (timestamps help correlation)
  • 123. IM Bot (timestamps help correlation)
  • 124. IM Bot (timestamps help correlation) all IRC, IM bot into searchable history
  • 125. Morals of Our Stories
  • 126. Morals of Our Stories - Optimizations can be a Very Good Thing™
  • 127. Morals of Our Stories - Optimizations can be a Very Good Thing™ - Weigh time spent optimizing against expected gains
  • 128. Morals of Our Stories - Optimizations can be a Very Good Thing™ - Weigh time spent optimizing against expected gains - Lean on others for how much “expected gains” mean for different scenarios
  • 129. Morals of Our Stories - Optimizations can be a Very Good Thing™ - Weigh time spent optimizing against expected gains - Lean on others for how much “expected gains” mean for different scenarios - Plain old-fashioned intuition
  • 130. Some Wisdom Nuggets Jon Prall’s 85 WebOps Rules: http://jprall.vox.com/library/post/85- operations-rules-to-live-by.html

Notas del editor

  1. Finding huge gains from tweaking gets harder as you turn the knobs and pull the levers.
  2. He had some evidence that suggested compilers and versions would make a difference for PHP.
  3. So, we did it. (not just for performance reasons, but still...) Will this happen to you, too? Maybe. Maybe not.
  4. Performance gains like this don’t come very often, for “free”.
  5. We don’t need 80% of what ImageMagick has.
  6. We don’t need 80% of what ImageMagick has.
  7. We don’t need 80% of what ImageMagick has.
  8. Cascading resizing: Original -> Large Large -> Medium Large -> Small Medium -> Thumb Medium -> Square
  9. This was done before OpenMP support was in. Compilers and optimization flags can make a difference!
  10. Enter OpenMP. Example of how
  11. These have been the revisions of our image processing hardware over time. 4x faster
  12. Examples are contact notifications, large photoset deletions, etc.
  13. Examples are contact notifications, large photoset deletions, etc.
  14. Examples are contact notifications, large photoset deletions, etc.
  15. Examples are contact notifications, large photoset deletions, etc.
  16. Examples are contact notifications, large photoset deletions, etc.
  17. Examples are contact notifications, large photoset deletions, etc.
  18. Runbook hacks: tuning the process of operations failure handling and mitigation.
  19. Low water marks as indirect trouble indicators.
  20. Low water marks as indirect trouble indicators.
  21. Can be as simple as cron jobs, or nagios plugins. NRPE or NSCA.
  22. Can be as simple as cron jobs, or nagios plugins. NRPE or NSCA.
  23. Can be as simple as cron jobs, or nagios plugins. NRPE or NSCA.
  24. Can be as simple as cron jobs, or nagios plugins. NRPE or NSCA.
  25. Can be as simple as cron jobs, or nagios plugins. NRPE or NSCA.
  26. Can be as simple as cron jobs, or nagios plugins. NRPE or NSCA.
  27. Skippable errors: “can’t drop”, “already exists”, “duplicate key name”
  28. Skippable errors: “can’t drop”, “already exists”, “duplicate key name”
  29. Skippable errors: “can’t drop”, “already exists”, “duplicate key name”
  30. Skippable errors: “can’t drop”, “already exists”, “duplicate key name”
  31. Simple tips and tricks that can help in fixing things when they break.
  32. This shouldn’t be considered optional.
  33. This shouldn’t be considered optional.
  34. Our IRC and IM logs get injected into a search engine, almost exactly like Lucene.
  35. Our IRC and IM logs get injected into a search engine, almost exactly like Lucene.
  36. Our IRC and IM logs get injected into a search engine, almost exactly like Lucene.
  37. Our IRC and IM logs get injected into a search engine, almost exactly like Lucene.