13. @RealGeneKim
The IT Core Chronic Conflict
Every IT organization is
pressured to simultaneously:
Respond more quickly to urgent
business needs
Provide stable, secure and
predictable IT service
13
Source: The authors acknowledge Dr. Eliyahu Goldratt, creator of the Theory of Constraints and author
of The Goal, has written extensively on the theory and practice of identifying and resolving core,
chronic conflicts.
14. @RealGeneKim
Every Company Is An IT Company…
95% of all capital projects have an IT
component…
50% of all capital spending is
technology-related
We are here…
Where we need
to be…
IT is always in the way
(again…)
27. @RealGeneKim
Making Changes When It Matters Most
“By installing a rampant innovation culture,
we performed 165 experiments in the peak three
months of tax season.”
“Our business result? Conversion rate of the
website is up 50 percent. Employee result?
Everyone loves it, because now their ideas can
make it to market.”
–Scott Cook, Intuit Founder
28. @RealGeneKim
Who Is Doing DevOps?
Google, Amazon, Netflix, Etsy, Spotify, Twitter, Facebook …
Dynatrace, CSC, IBM, CA, SAP, HP, Microsoft, Red Hat …
GE Capital, Nationwide, BNP Paribas, BNY Mellon,
World Bank, Paychex, Intuit …
The Gap, Nordstrom, Macy’s, Williams-Sonoma, Target …
General Motors, Northrop Grumman, LEGO, Bosche …
UK Government, US Department of Homeland Security …
Kansas State University…
Who else?
29. @RealGeneKim
High Performers Are More Agile
30x 8,000x
more frequent
faster lead times
deployments
than their peers
Source: Puppet Labs 2013 State Of DevOps: http://puppetlabs.com/2013-state-of-devops-infographic
30. @RealGeneKim
High Performers Are More Reliable
2x 12x
the change
success rate
faster mean time
to recover (MTTR)
Source: Puppet Labs 2013 State Of DevOps: http://puppetlabs.com/2013-state-of-devops-infographic
31. High Performers Win In The Marketplace
2x 50%
more likely to
exceed profitability,
market share &
productivity goals
@RealGeneKim
higher market
capitalization growth
over 3 years*
Source: Puppet Labs 2014 State Of DevOps
32. Organizations with high performing DevOps
…and had 50% higher market
capitalization growth over 3
@RealGeneKim
organizations were 2.5x more likely to
exceed profitability, market share and
productivity goals…
Source: Puppet Labs 2014 State Of DevOps
years…
34. “This book will have a profound effect on IT,
just as The Goal did for manufacturing.”
–Jez Humble,
co-author Continuous Delivery
“This is the IT swamp draining manual for
anyone who is neck deep in alligators.”
–Adrian Cockroft,
Cloud Architect at Netflix
“This is The Goal for our decade,
and is for any IT professional who wants
their life back.”
–Charles Betz, IT architect, author
“Architecture and Patterns for IT”
@RealGeneKim
40. @RealGeneKim
Create One Step Environment
Creation Process
Make environments available early in the
Development process
Make sure Dev builds the code and environment
at the same time
Create a common Dev, QA and Production
environment creation process
41. @RealGeneKim
If I had a magic wand,
I’d change the Agile sprints and
definition of “done”:
“At the end of each sprint, we must
have working and shippable code…
demonstrated in an environment
that resembles production.”
42. @RealGeneKim
How organizations achieve high performance
44
• 89% are using infrastructure version control
• 82% are using automated code deployments
Source: Puppet Labs 2012 State Of DevOps: http://puppetlabs.com/2013-state-of-devops-infographic
43. Deploy Smaller Changes, More Frequently *
@RealGeneKim
Source: http://www.facebook.com/note.php?note_id=14218138919
44. Deploy Smaller Changes, More Frequently *
@RealGeneKim
Decouple feature releases from code
deployments
Deploy features in a disabled state, using feature
flags
Require all developers check code into trunk
daily (at least)
Practice deploying smaller changes, which
dramatically reduces risk and improves MTTR
45. Experiment: Reducing Batch Size By 50%
And the customer got the feature in
@RealGeneKim
half the time!
Source: Scott Prugh, Chief Architect, CSG, Inc.
46. @RealGeneKim
Breaking The Bottlenecks In The Flow
Environment creation
Code deployment
Test setup and run (mention @rohansingh)
Overly tight architecture
Development
Product management
47. “In November 2011, running even the most minimal
test for CloudFoundry required deploying to 45 virtual
machines, which took a half hour. This was way too
long, and also prevented developers from testing on
@RealGeneKim
their own workstations.
By using containers, within months, we got it down to
18 virtual machines so that any developer can deploy
the entire system to single VM in six minutes.”
— Elisabeth Hendrickson, Director of Quality
Engineering, Pivotal Labs
50. Top Predictors Of IT Performance (2014)
@RealGeneKim
Version control of all production artifacts
Continuous integration and deployment
Automated acceptance testing
Peer-review of production changes (vs. external
change approval)
High trust culture
Proactive monitoring of the production environment
Win-win relationship between Dev and Ops
Source: Puppet Labs 2014 State Of DevOps
51. @RealGeneKim
The First Way: Outcomes
Creating single repository for code and environments
Determinism in the release process
Developers can replay performance problems upstream
Consistent Dev, Test and Production environments, all properly
built before deployment begins
Features being deployed daily without catastrophic failures
Decreased lead time
Faster cycle time and release cadence
Technologies required:
Automated testing, traceability across builds
54. How many times per day is the andon cord
@RealGeneKim
pulled in a typical day at a Toyota
manufacturing plant?
3500 times per day
55. Why would Toyota do something so disruptive as
stopping production thousands of times per day?
@RealGeneKim
“It’s the only way we can build 2,000 vehicles
per day – that’s one completed vehicle every
55 seconds.”
56. @RealGeneKim
Google Dev And Ops (2013)
15,000 engineers, working on 4,000+ projects
All code is checked into one source tree
(billions of files!)
5,500 code commits/day
75 million test cases are run daily
"Automated tests transform fear into boredom."
-- Eran Messeri, Google
57. @RealGeneKim
Developers Carry Pagers
“We found that when we woke up developers at
2am, defects got fixed faster than ever”
– Patrick Lightbody,
CEO, BrowserMob
“You build it, you run it.”
– Werner Vogels
CTO, Amazon
58. @RealGeneKim
Developers Carry Pagers
“As a developer, there has never been a more
satisfying point in my career than when I wrote
the code, I pushed the button to deploy it,
I watched the metrics to see if it actually worked
in production, and fixed it if it broke.”
– Tim Tischler
Director of Operations Engr,
Nike, Inc.
61. @RealGeneKim
Pervasive Production Telemetry
“Having a
developer add a
monitoring metric
shouldn’t feel like
a schema
change.”
– John Allspaw,
SVP Tech Ops,
Etsy
64. @RealGeneKim
Patterns: Shift Testing Left
Find performance problems early
Allow Dev into production environment (where
possible)
Enable Dev to re-create production errors in Test
68. Can Large Orgs Adopt These Practices?
@RealGeneKim
Source: Puppet Labs 2014 State Of DevOps
Yes!
(Automated testing,
Continuous integration,
proactive monitoring, even
high trust cultures!)
The only practice not being
adopted is Peer Review vs.
Change Approval!
69. Top Predictors Of IT Performance (2014)
@RealGeneKim
Version control of all production artifacts
Continuous integration and deployment
Automated acceptance testing
Peer-review of production changes (vs. external
change approval)
High trust culture
Proactive monitoring of the production environment
Win-win relationship between Dev and Ops
Source: Puppet Labs 2014 State Of DevOps
70. @RealGeneKim
The Second Way: Outcomes
Defects and security issues getting fixed faster than ever
Disciplined automated testing enabling many
simultaneous small, agile teams to work productively
All groups communicating and coordinating better
Everybody is getting more work done
Technologies required: …
71. The Third Way:
Continual Experimentation And Learning
@RealGeneKim
72. @RealGeneKim
Break Things Early And Often
“Do painful things more frequently, so you can
make it less painful… We don’t get pushback
from Dev, because they know it makes rollouts
smoother.”
– Adrian Cockcroft,
Former Architect, Netflix
(Now Technology Fellow,
Battery Ventures)
77. “By November 2011, Kevin Scott,
LinkedIn’s top engineer, had had
enough. The system was taxed as
LinkedIn attracted more users, and
engineers were burnt out.
“To fix the problems, Scott, who’d
arrived from Google that February,
launched Operation InVersion.
“He froze development on new
features so engineers could overhaul
the computing architecture.
“`We had to tell management we’re
not going to deliver anything new
while all of engineering works on this
project for the next two months,’
Scott says. “It was a scary thing.’”
@RealGeneKim
85. @RealGeneKim
DevOps Enterprise Summit
Save the date: October 21-23, 2014
DevOps Enterprise is a conference for horses, by horses
Macy’s, Disney, GE Capital, Blackboard, Telstra, US Department of
Homeland Security, CSG, Raytheon, Ticketmaster, Union Bank of
California
Leaders driving DevOps transformations will talk about
The business problem they set out to solve
The obstacles they had to overcome
The business value they created
Register at http://devopsenterprise.io/
Use promo code “DYNATRACE20” (expires 10/20)
86. @RealGeneKim
Want More Information?
To receive the following:
A copy of this presentation
A 140 page excerpt of "The Phoenix Project”
More information on the DevOps Enterprise
Summit (20% discount: DYNATRACE20)
Join the reviewer list for our upcoming
“DevOps Cookbook”
Just pick up your phone, and send an email:
To: realgenekim@zip.sh
Subject: dynatrace
87. @RealGeneKim
Can Large Orgs Be High Performers?
Source: Puppet Labs 2014 State Of DevOps
Yes.
But orgs with 10,000+
employees 40% less likely
to be high performing vs.
500 employee orgs…
My name is Gene Kim. My area of passion started when I was the CTO and founder of Tripwire in 1999. I started keeping a list that we called “Gene’s list of people with great kung fu.” These were the organizations that simutaneously…
In the next 25 minutes, I’m really excited to share with you some of my key learnings, which I’m hoping that will not only be applicable to you, but that you’ll be able to put into practice right away, and get some amazing results.
But let me tell you how my journey began…
Who are they auditing? IT operations.
I love IT operatoins. Why? Because when the developers screw up, the only people who can save the day are the IT operations people.
Memory leak? No problem, we’ll do hourly reboots until you figure that out.
Who here is from IT operations?
Bad day:
Not as prepared for the audit as they thought
Spending 30% of their time scrambling, generating presentation for auditors
Or an outage, and the developer is adamant that they didn’t make the change – they’re saying, “it must be the security guys – they’re always causing outages”
Or, there’s 50 systems behind the load balancer, and six systems are acting funny – what different, and who made them different
Or every server is like a snowflake, each having their own personality
We as Tripwire practitioners can help them make sure changes are made visible, authorized, deployed completely and accurately, find differences
Create and enforce a culture of change management and causality
EG Parts Unlimited, Inc. DBA Parts Unlimited in is serious trouble. Stock has tumbled 19% in the last 30 days, and is down 52% from its peak three years ago. The company continues to be outmaneuvered by their arch-rival, famous for their ability to anticipate and instantly react to customer needs. Parts Unlimited now trails the competition in sales growth, inventory turns and profitability.
Parts Unlimited has been promising the release of a software, call “Phoenix” which – if they can ever get it release – should close the gap. It tightly integrates its retailing and e-commerce channels. Already years late, many expect the company to announce another program delay in their analyst earnings call next month. 20 million in, years late and the Board and the Investors are – let’s just say the natives are restless and are looking for heads. Which mean not only have some of the players been let go, and moved positions, but the board is looking at outsourcing and / or splitting up the company..
The board has given the team six months to make dramatic improvements.
Source: Flickr: birdsandanchors
Who’s introducing variance? Well, it’s often these guys. Show me a developer who isn’t causing an outage, I’ll show you one who is on vacation.
Primary measurement is deploy features quickly – get to market.
I’ve worked with two of the five largest Internet companies (Google, Microsoft, Yahoo, AOL, Amazon), and I now believe that the biggest differentiator to great time to market is great operations:
Bad day:
We do 6 weeks of testing, but deployment still fails. Why? QA environment doesn’t match production
Or there’s a failure in testing, and no one can agree whether it’s a code failure or an environment failure
Or changes are made in QA, but no one wrote them down, so they didn’t get replicated downstream in production
Believe it or not, we as Tripwire practitioners can even help them – make sure environments are available when we need them, that they’re properly configured correctly the first time, document all the changes, replicate them downstream
[ picture of messy data center ] Ten minutes into Bill’s first day on the job, he has to deal with a payroll run failure. Tomorrow is payday, and finance just found out that while all the salaried employees are going to get paid, none of the hourly factory employees will. All their records from the factory timekeeping systems were zeroed out.Was it a SAN failure? A database failure? An application failure? Interface failure? Cabling error?
So who are all these constituencies that we can help, and increase our relevance as Tripwire practitioners and champions?
How many people here are in infosec?
Goal: protect critical systems and data
Safeguard organizational commitments
Prevent security breaches, help quickly detect and recover from them
Bad day: no security standards
No one is complying
Yes, we’re 3 years behind. “Whaddya gonna do about it?”
Vs. we (Tripwire owner) can become more relevant and add value by help infosec by leveraging all the configuration guidance out there
Measure variance between produciton and those known good states
Trust and verify that when management says, we’ve trued up the configurations, they’ve actually done it
Why? Now, more than ever, there are an ever increasing amount of regulatory and contractual requirements to protect systems and data
There are many ways to react to this: like, fear, horror, trying to become invisible… All understandable, given the circumstances…
Because infosec can no longer take 4 weeks to turn around a security review for application code, or take 6 weeks to turnaround a firewall change.
But, on the other hand, I think it’s will be the best thing to ever happen to infosec in the past 20 years. We’re calling this Rugged DevOps, because it’s a way for infosec to integrate into the DevOps process, and be welcomed. And not be viewed as the shrill hysterical folks who slow the business down.
Tell story of Amazon, Netflix: they care about, availability, security
It’s not a push, it’s a pull – they’re looking for our help (#1 concern: fear of disintermediation and being marginalized)
Eran Feigenbaum
Director of Security, Google Enterprise
[ picture of messy data center ] Ten minutes into Bill’s first day on the job, he has to deal with a payroll run failure. Tomorrow is payday, and finance just found out that while all the salaried employees are going to get paid, none of the hourly factory employees will. All their records from the factory timekeeping systems were zeroed out.Was it a SAN failure? A database failure? An application failure? Interface failure? Cabling error?