SlideShare una empresa de Scribd logo
1 de 94
Descargar para leer sin conexión
Rails Operations
                   Lessons learned from deploying and managing hundreds
                                     of Rails applications




Thanks for coming out this morning. I know it’s hungover oclock,
so it means a lot. You are dedicated, upstanding individuals.
Oh hi, I’m Josh
• @techpickles
                    • http://github.com/technicalpickles
                    • http://technicalpickles.com




I am from the internet
Awesomeness Engineer
                     of Supreme Versatility
                                              II




My official title is Awesomeness Engineer of Supreme Versatility.
2. (I recently was promoted)
Managed hosting and
                                            operations



We’re mostly known for our hosting. What isn’t as well known is our managed services. For this, we engage more closely
with our customers.

When bringing on new managed customers, we work with them to spec out servers, review application’s needs. We get
them up and running on these servers with our configuration management tool, moonshine. And once deployed, we
provide 24x7 monitoring. If you’re server goes down, we let you know, and get it back online as soon as possible,
regardless of when it happens.

And that’s not all. Once live, we provide operational support. Anything from application performance analysis,
recommending architecture improvements, installing and managing new software on servers, or just being there to give
feedback on how the application is operating.

You can basically think of us as a Rails Operations company.
I’m talking about
                       Rails Operations



Conveniently enough, I’m talking about Rails Operations today.
WTF is
                                           Rails Operations?



I found this hard to distill down to a simple statement.

I think it’s safe to say that the majority of us are developers. We write code, build applications, launch products.

A lot of organizations, operations is something different. eople associate operations with system administration. And
to an extent, this can be fairly accurate. Different people, different teams, different. As developers, we write some
code, and toss it over wall, and let _them_ handle it.

I think this is a bit flawed. The code you write has an operational impact. The systems you run it on have an
operational impact on your code. It’s a complex relationship, and when developer and operations teams are
separate, it’s hard to bridge the gap between, since it’s neithers responsibility.
Development and maintenance
                    of a production Rails application




The simplest definition I’ve found is this.
Very important assumption
                   You develop code that will eventually go into production,
                    and in part to some business model, generate revenue




That is to say, you are part of some organization
Before we dig in too
       deep...
Let’s talk about the business. We need to start with where
development and operations fit within the rest of The Business.
Question
Does development generate revenue?
• Takes place on laptops, desktop machines,
  staging servers

• No real users
• Unknown if it truly works
 • Tests are green, but...
NO
but it CREATES
potential revenue
• Step 1: Development
• Step 2: .......
• Step 3: PROFIT
Question
Does operations generate revenue?
• Lives on servers located in data centers
  and clouds

• Real users
• Either code works, or it doesn’t
• Either the application is available or not
NO

Just because your application works in production, doesn’t mean
people are using it or buying your product.
but it PRESERVES
                       potential revenue



If you have good operations, that means users will be able to see
your application working and actually be able to use it.
• Step 1: Development
• Step 2: Operations
• Step 3: ......
• Step 4: PROFIT!
Question
Uh, what generates revenue?
Million Dollar Question
• Working features (or at least that work
  enough)

• Infrastructure to keep the application up
  and running (or at least up enough)

• A business model
• Sheer determination
• Good luck
Lessons learned




Alright. I’ve given you a definition of Rails Operations, and had a
brief detour to talk about the business and where development
and operations fit into it.

Now for some lessons. Basically, I’ll be going over some patterns,
some antipatterns, and other practices and topics.
Common threads




Putting this all together, I kept coming back to some common
threads. That is, some ideas that apply to many aspects. I’m going
to start you off with a few together, and then just jump into the
lessons. We’ll probably pick up a few more along the way.
Give a damn

If you don’t care about what you’re doing, everything else I’m
talking about today probably doesn’t matter. I don’t think you
need to worry about this though, since you are here.
Earlier we talked about how operations preserves revenue. To that
end, our goal is to mitigate risk as much as makes sense.
Tradeoffs and compromise. Each possible solution has them. The
trick is understanding that there are tradeoffs. What tradeoffs you
make depends on what your priorities are. For example:

 *   Dollar signs
 *   Time
 *   Sanity
 *   Technical debt
 *   Higher risk
Configuration
Management
    Pattern
It’s about managing
    configuration.
        duh.
You write code that
                        manages your servers’
                            configuration



Take a moment to think about how you might describe a server to someone.
There’s plenty of nouns:

*   packages
*   users
*   files
*   cronjobs
*   services

And some verbs:

* running commands
• apache package is installed
• apache service is running
• deploy user exists
• cron jobs
• etc
• Moonshine
• Puppet
• Chef
Automation




Bootstrapping. Anyone that has setup a new server from scratch
can tell you... it’s time consuming, labor intensive, and error
prone.

Bootstraping is just part of it though, only ever happens once
though. What’s more interesting is that you can use this to
manage your infrastructure as it involves. Need to start using
redis? Just add it to your configuration management, and you’ll
have it next deploy.
The best way to illustrate why you should be using configuration management is to explore the
consequences of not using it.

Imagine it’s time to add a new application server. Your application is under heavy load, and needs this
server to be up and serving requests. How long will it take you to get it up? And how will you know it’s
setup correctly? If you’re doing this all manually, you can’t really know the answers to these questions.

Here’s another example. Adding a new dependency to your application. It can be a gem, a native
package, a new daemon, whatever. How do you ensure this gets on the server when you need it?
Deploy and pray? Log into the server and install it yourself? This sucks, and kind of risky especially if
you’re talking about production.
As always, there’s tradeoffs to be made.

Setting up and learning how to do configuration management takes time. Time that could be
spent working on user-facing tasks.

Taking on risk of having to cold deploy, or having deploys fail because of missing
dependencies.

Usually, the balance is to have to take the risk and have it burn you enough times that it’s more
painful to not stop and get your configuration management on, that it is to not do so.

If you do know it, it’s a no brainer. Just DO IT.
Staging Servers
     Pattern
Preproduction servers




Staging servers are all about being a testbed between
Helps ensure
correctness of deploy
configuration
                         management
                               +
                        staging servers
                               =
                           VERY YES
If you use configuration management, and have staging servers,
then this is a huge win.

We talked about adding new dependencies earlier. If you are
doing configuration management, then staging is the first place
you can see if ur doing it right.
There’s basically no downside to using staging servers. The only
tradeoff though is that servers do cost dollar signs and staging
servers are no different. This leads us to a new thread...
Maths... look around you. In most cases, you can do some dollar sign math to justify costs of a thing. Let’s try this.

A staging server may cost $60/mo

But how can you calculate the cost of not having a staging server? Let’s assume that if you don’t have a staging server,
you’re bound to do a bad deploy that it could have prevented. Some code that doesn’t work outright, or is otherwise
flawed. Let’s say it causes an hour of downtime while you determine the problem and try to fix it. Do you know how much it
costs your business in lost revenue to be down an hour?

This is actually a pretty mature question, and I’d be surprised if many people can answer it off hand. In any event, I think
we can do some fuzzy math to say yeah, it probably is more than $60. If that’s the case, then one failed deploy a month is
enough to validate a staging server.
Repeat after me
•   development

•   staging

•   production
capistrano-gitflow




Whenever possible, I like to enforce standard by means of automation

For the flow of code from development -> staging -> production, we have capistrano-gitflow.
Originally done up by apinstein, I did some refactorings and cleaned it up enough to be usable as a
gem

Effectively, this enforces development -> staging -> production. Whenever you deploy to staging, it
tags the current branch including information about the date, the user deploying, and a small blurb
about the changes. Assuming this is cool, you can promote a tag to production and go on from there.
If you haven’t deployed to staging yet, you’ll be promtpted and it will default to using the last
production tag.
Deploy early, deploy
      often
        Pattern
A play on release early,
                         release often.
                       Although technically, I guess it’s the same




It’s basically the same thing we hear in the open source
community.

The sooner you release code, the sooner you can validate it and
the sooner you can get feedback. Does it work? Does it not break
the entire site? Are users happy?
By deploying early and often, we’re also limiting risk. The less
changes that go out in a single deploy, the less things there are
that can possibly break. By waiting to deploy, you’re accumulating
a larger set of changes to deploy, and therefore there’s more
surface area to debug if it breaks.
In a way, you can consider undeployed code a liability.

Imagine spending a day or two doing some code cleanups to get ready for a sprint. Should you deploy
when you are done and happy with the refactorings, or should you go ahead and do your sprint.

If it were me, I’d deploy the refactorings first. That way, the code is out there, and you’ll know if it
performs equally to its nonrefactored version. It’s really easy to introduce performance killing changes
in even a few line diff.

If you instead wait and deploy with new features, if anything goes awry, you have significantly more
code to spelunk to track down a potential problem.
Feeling Driven
Development
    Antipattern
Oh feelings.
The front page feels
       slow
The primary key seems
  like it’s increasing
         rapidly
IO seems high
What does it even
                             mean?



This drives me nuts. By saying something ‘feels’ slow, there’s an
implied assumption. The assumption is that it should be fast.
Saying it like that is...weird, because it gives no indication of what
is slow or not.

The trick is in determining what the assumption is, and then
finding a way to measure and identify the problem.

How can we do this?
Science Driven
 Development
   Counterpattern
Metrics everywhere!




With the right tools, you can easily be continuously collecting data
so you have it in your pocket when you need it.
• New Relic - http://newrelic.com
                                       • Scout - http://scoutapp.com




These are the two we use and highly recommend.

New Relic is really great for giving a high level view of your application. We’re talking at the request response level,
including all sorts of fun maths with most time consuming requests, highest standard deviation, etc. It also breaks down
requests by where time spent. Like if it’s all in the view, the controller, the database, partials, etc etc

Scout is useful for other reasons. While New Relic is good for high level understanding of your application, Scout is a bit
more low level. You can use it to collect metrics about your servers, and how well they are running. Memory, CPU, disk
space, IO, mysql connection stats, and so on.

I really believe these are a great combination, because New Relic can point you in the direction of a problem area, and Scout
can better understand what’s contributing to it at a system level.
The front page feels
         slow
The front page is taking 10 seconds to load, but we
  really need it to be loading in under 1 second
The primary key seems
  like it’s increasing
         rapidly
The primary key is at 90% of it’s maximum, up from
80% yesterday, and looks like it’ll run out overnight.
IO seems high
IO fluctatues up to 90% sometimes, but doesn’t appear
              to have a negative effect
Monitoring
   Topic
How do you know
when everything is
     awful?
How would you prefer
     to know?
• Angry tweets
• Angry email from your boss
• You personally checking everything all the
  time
• An automated system to let you know
• Nagios
• Scout
What to monitor




It’s not a problem til it’s a problem
Define priority




Does it wake someone up?
Must be actionable
Single point of contact




If everything is awful, needs to be a single point of contact. They
take point, acknowledge and begin looking into it. If need be,
bring on others
Vertical scaling
      Pattern
Your app is slow
     Now what?
Resources are
(relatively) cheap
Developers are
(relatively) expensive
Imagine having memory issues.
As always there’s a balance.

Remember, it’s a tradeoff to optimize for developer time by
vertically scaling. It buys you time to either deal
Hipster Stack
    Antipattern
“I read a blog post
about how mongo is
 totally web scale”
Cargo cult operations
Remember what’s important for th ebusiness? Do you want to
become the expert at <insert technology here>? Is it really the
most valuable thing you can be doing?
If you’re still going to go
         hipster...

• experiment in branches
• understand operational impact
• Staging!
Test in production
      Wait, what?
Further Reading

                       • Web Operations - John Allspaw and Jesse
                         Robins
                       • Continuous Delivery - Jez Humble and
                         David Farley
                       • “Web Operations for Developers 101”


http://www.amazon.com/Web-Operations-Keeping-Data-Time/dp/1449377440/
ref=sr_1_1?s=books&ie=UTF8&qid=1314447411&sr=1-1

http://www.amazon.com/Continuous-Delivery-Deployment-Automation-
Addison-Wesley/dp/0321601912/ref=sr_1_4?
s=books&ie=UTF8&qid=1314447411&sr=1-4

http://www.paperplanes.de/2011/7/25/web_operations_101_for_developers.html
Fin.
Want to talk ops?
       find me here
    josh@railsmachine
       @techpickles
Do you like these
        things?
• Rails
• Operations
• Ping Pong
• Beer
       We are hiring

Más contenido relacionado

Similar a Rails Operations - Lessons Learned

Uklug 2011 administrator development synergy
Uklug 2011 administrator development synergyUklug 2011 administrator development synergy
Uklug 2011 administrator development synergy
dominion
 
Cloud-Native Workshop - Santa Monica
Cloud-Native Workshop - Santa MonicaCloud-Native Workshop - Santa Monica
Cloud-Native Workshop - Santa Monica
VMware Tanzu
 
Project_Estimation
Project_EstimationProject_Estimation
Project_Estimation
Naeem Bari
 
Reactive Microservice Architecture with Groovy and Grails
Reactive Microservice Architecture with Groovy and GrailsReactive Microservice Architecture with Groovy and Grails
Reactive Microservice Architecture with Groovy and Grails
Steve Pember
 

Similar a Rails Operations - Lessons Learned (20)

Uklug 2011 administrator development synergy
Uklug 2011 administrator development synergyUklug 2011 administrator development synergy
Uklug 2011 administrator development synergy
 
DevOps in 2014
DevOps in 2014DevOps in 2014
DevOps in 2014
 
Enterprise Devops Presentation @ Magentys Seminar London May 15 2014
Enterprise Devops Presentation @ Magentys Seminar London May 15 2014Enterprise Devops Presentation @ Magentys Seminar London May 15 2014
Enterprise Devops Presentation @ Magentys Seminar London May 15 2014
 
The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...
The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...
The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...
 
Dev Ops without the Ops
Dev Ops without the OpsDev Ops without the Ops
Dev Ops without the Ops
 
The Open-Source Monitoring Landscape
The Open-Source Monitoring LandscapeThe Open-Source Monitoring Landscape
The Open-Source Monitoring Landscape
 
The Open-Source Monitoring Landscape
The Open-Source Monitoring LandscapeThe Open-Source Monitoring Landscape
The Open-Source Monitoring Landscape
 
451’s Berkholz on How DevOps, Automation and Orchestration Combine for Contin...
451’s Berkholz on How DevOps, Automation and Orchestration Combine for Contin...451’s Berkholz on How DevOps, Automation and Orchestration Combine for Contin...
451’s Berkholz on How DevOps, Automation and Orchestration Combine for Contin...
 
Cloud-Native Workshop - Santa Monica
Cloud-Native Workshop - Santa MonicaCloud-Native Workshop - Santa Monica
Cloud-Native Workshop - Santa Monica
 
DevOps - Continuous Integration, Continuous Delivery - let's talk
DevOps - Continuous Integration, Continuous Delivery - let's talkDevOps - Continuous Integration, Continuous Delivery - let's talk
DevOps - Continuous Integration, Continuous Delivery - let's talk
 
Cloud Foundry Summit 2015: Devops, microservices and platforms, oh my!
Cloud Foundry Summit 2015: Devops, microservices and platforms, oh my!Cloud Foundry Summit 2015: Devops, microservices and platforms, oh my!
Cloud Foundry Summit 2015: Devops, microservices and platforms, oh my!
 
devops, microservices, and platforms, oh my!
devops, microservices, and platforms, oh my!devops, microservices, and platforms, oh my!
devops, microservices, and platforms, oh my!
 
DevOps make teamwork great.
DevOps make teamwork great.DevOps make teamwork great.
DevOps make teamwork great.
 
Project_Estimation
Project_EstimationProject_Estimation
Project_Estimation
 
IBM Cloud Service Management and Operations Field Guide
IBM Cloud Service Management and Operations Field GuideIBM Cloud Service Management and Operations Field Guide
IBM Cloud Service Management and Operations Field Guide
 
Reactive Microservice Architecture with Groovy and Grails
Reactive Microservice Architecture with Groovy and GrailsReactive Microservice Architecture with Groovy and Grails
Reactive Microservice Architecture with Groovy and Grails
 
12 Steps To Soa Final
12 Steps To Soa Final12 Steps To Soa Final
12 Steps To Soa Final
 
5 Tips to Bulletproof Your Analytics Implementation
5 Tips to Bulletproof Your Analytics Implementation5 Tips to Bulletproof Your Analytics Implementation
5 Tips to Bulletproof Your Analytics Implementation
 
Consul: Service-oriented at Scale
Consul: Service-oriented at ScaleConsul: Service-oriented at Scale
Consul: Service-oriented at Scale
 
Scaling Software Delivery.pdf
Scaling Software Delivery.pdfScaling Software Delivery.pdf
Scaling Software Delivery.pdf
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Rails Operations - Lessons Learned

  • 1. Rails Operations Lessons learned from deploying and managing hundreds of Rails applications Thanks for coming out this morning. I know it’s hungover oclock, so it means a lot. You are dedicated, upstanding individuals.
  • 3. • @techpickles • http://github.com/technicalpickles • http://technicalpickles.com I am from the internet
  • 4.
  • 5. Awesomeness Engineer of Supreme Versatility II My official title is Awesomeness Engineer of Supreme Versatility. 2. (I recently was promoted)
  • 6. Managed hosting and operations We’re mostly known for our hosting. What isn’t as well known is our managed services. For this, we engage more closely with our customers. When bringing on new managed customers, we work with them to spec out servers, review application’s needs. We get them up and running on these servers with our configuration management tool, moonshine. And once deployed, we provide 24x7 monitoring. If you’re server goes down, we let you know, and get it back online as soon as possible, regardless of when it happens. And that’s not all. Once live, we provide operational support. Anything from application performance analysis, recommending architecture improvements, installing and managing new software on servers, or just being there to give feedback on how the application is operating. You can basically think of us as a Rails Operations company.
  • 7. I’m talking about Rails Operations Conveniently enough, I’m talking about Rails Operations today.
  • 8. WTF is Rails Operations? I found this hard to distill down to a simple statement. I think it’s safe to say that the majority of us are developers. We write code, build applications, launch products. A lot of organizations, operations is something different. eople associate operations with system administration. And to an extent, this can be fairly accurate. Different people, different teams, different. As developers, we write some code, and toss it over wall, and let _them_ handle it. I think this is a bit flawed. The code you write has an operational impact. The systems you run it on have an operational impact on your code. It’s a complex relationship, and when developer and operations teams are separate, it’s hard to bridge the gap between, since it’s neithers responsibility.
  • 9. Development and maintenance of a production Rails application The simplest definition I’ve found is this.
  • 10. Very important assumption You develop code that will eventually go into production, and in part to some business model, generate revenue That is to say, you are part of some organization
  • 11.
  • 12. Before we dig in too deep...
  • 13. Let’s talk about the business. We need to start with where development and operations fit within the rest of The Business.
  • 15. • Takes place on laptops, desktop machines, staging servers • No real users • Unknown if it truly works • Tests are green, but...
  • 16. NO
  • 18. • Step 1: Development • Step 2: ....... • Step 3: PROFIT
  • 20. • Lives on servers located in data centers and clouds • Real users • Either code works, or it doesn’t • Either the application is available or not
  • 21. NO Just because your application works in production, doesn’t mean people are using it or buying your product.
  • 22. but it PRESERVES potential revenue If you have good operations, that means users will be able to see your application working and actually be able to use it.
  • 23. • Step 1: Development • Step 2: Operations • Step 3: ...... • Step 4: PROFIT!
  • 26. • Working features (or at least that work enough) • Infrastructure to keep the application up and running (or at least up enough) • A business model • Sheer determination • Good luck
  • 27. Lessons learned Alright. I’ve given you a definition of Rails Operations, and had a brief detour to talk about the business and where development and operations fit into it. Now for some lessons. Basically, I’ll be going over some patterns, some antipatterns, and other practices and topics.
  • 28. Common threads Putting this all together, I kept coming back to some common threads. That is, some ideas that apply to many aspects. I’m going to start you off with a few together, and then just jump into the lessons. We’ll probably pick up a few more along the way.
  • 29. Give a damn If you don’t care about what you’re doing, everything else I’m talking about today probably doesn’t matter. I don’t think you need to worry about this though, since you are here.
  • 30. Earlier we talked about how operations preserves revenue. To that end, our goal is to mitigate risk as much as makes sense.
  • 31. Tradeoffs and compromise. Each possible solution has them. The trick is understanding that there are tradeoffs. What tradeoffs you make depends on what your priorities are. For example: * Dollar signs * Time * Sanity * Technical debt * Higher risk
  • 33. It’s about managing configuration. duh.
  • 34. You write code that manages your servers’ configuration Take a moment to think about how you might describe a server to someone. There’s plenty of nouns: * packages * users * files * cronjobs * services And some verbs: * running commands
  • 35. • apache package is installed • apache service is running • deploy user exists • cron jobs • etc
  • 37. Automation Bootstrapping. Anyone that has setup a new server from scratch can tell you... it’s time consuming, labor intensive, and error prone. Bootstraping is just part of it though, only ever happens once though. What’s more interesting is that you can use this to manage your infrastructure as it involves. Need to start using redis? Just add it to your configuration management, and you’ll have it next deploy.
  • 38. The best way to illustrate why you should be using configuration management is to explore the consequences of not using it. Imagine it’s time to add a new application server. Your application is under heavy load, and needs this server to be up and serving requests. How long will it take you to get it up? And how will you know it’s setup correctly? If you’re doing this all manually, you can’t really know the answers to these questions. Here’s another example. Adding a new dependency to your application. It can be a gem, a native package, a new daemon, whatever. How do you ensure this gets on the server when you need it? Deploy and pray? Log into the server and install it yourself? This sucks, and kind of risky especially if you’re talking about production.
  • 39. As always, there’s tradeoffs to be made. Setting up and learning how to do configuration management takes time. Time that could be spent working on user-facing tasks. Taking on risk of having to cold deploy, or having deploys fail because of missing dependencies. Usually, the balance is to have to take the risk and have it burn you enough times that it’s more painful to not stop and get your configuration management on, that it is to not do so. If you do know it, it’s a no brainer. Just DO IT.
  • 40. Staging Servers Pattern
  • 41. Preproduction servers Staging servers are all about being a testbed between
  • 43. configuration management + staging servers = VERY YES If you use configuration management, and have staging servers, then this is a huge win. We talked about adding new dependencies earlier. If you are doing configuration management, then staging is the first place you can see if ur doing it right.
  • 44. There’s basically no downside to using staging servers. The only tradeoff though is that servers do cost dollar signs and staging servers are no different. This leads us to a new thread...
  • 45. Maths... look around you. In most cases, you can do some dollar sign math to justify costs of a thing. Let’s try this. A staging server may cost $60/mo But how can you calculate the cost of not having a staging server? Let’s assume that if you don’t have a staging server, you’re bound to do a bad deploy that it could have prevented. Some code that doesn’t work outright, or is otherwise flawed. Let’s say it causes an hour of downtime while you determine the problem and try to fix it. Do you know how much it costs your business in lost revenue to be down an hour? This is actually a pretty mature question, and I’d be surprised if many people can answer it off hand. In any event, I think we can do some fuzzy math to say yeah, it probably is more than $60. If that’s the case, then one failed deploy a month is enough to validate a staging server.
  • 46. Repeat after me • development • staging • production
  • 47. capistrano-gitflow Whenever possible, I like to enforce standard by means of automation For the flow of code from development -> staging -> production, we have capistrano-gitflow. Originally done up by apinstein, I did some refactorings and cleaned it up enough to be usable as a gem Effectively, this enforces development -> staging -> production. Whenever you deploy to staging, it tags the current branch including information about the date, the user deploying, and a small blurb about the changes. Assuming this is cool, you can promote a tag to production and go on from there. If you haven’t deployed to staging yet, you’ll be promtpted and it will default to using the last production tag.
  • 48. Deploy early, deploy often Pattern
  • 49. A play on release early, release often. Although technically, I guess it’s the same It’s basically the same thing we hear in the open source community. The sooner you release code, the sooner you can validate it and the sooner you can get feedback. Does it work? Does it not break the entire site? Are users happy?
  • 50. By deploying early and often, we’re also limiting risk. The less changes that go out in a single deploy, the less things there are that can possibly break. By waiting to deploy, you’re accumulating a larger set of changes to deploy, and therefore there’s more surface area to debug if it breaks.
  • 51. In a way, you can consider undeployed code a liability. Imagine spending a day or two doing some code cleanups to get ready for a sprint. Should you deploy when you are done and happy with the refactorings, or should you go ahead and do your sprint. If it were me, I’d deploy the refactorings first. That way, the code is out there, and you’ll know if it performs equally to its nonrefactored version. It’s really easy to introduce performance killing changes in even a few line diff. If you instead wait and deploy with new features, if anything goes awry, you have significantly more code to spelunk to track down a potential problem.
  • 54. The front page feels slow
  • 55. The primary key seems like it’s increasing rapidly
  • 57. What does it even mean? This drives me nuts. By saying something ‘feels’ slow, there’s an implied assumption. The assumption is that it should be fast. Saying it like that is...weird, because it gives no indication of what is slow or not. The trick is in determining what the assumption is, and then finding a way to measure and identify the problem. How can we do this?
  • 58. Science Driven Development Counterpattern
  • 59.
  • 60. Metrics everywhere! With the right tools, you can easily be continuously collecting data so you have it in your pocket when you need it.
  • 61. • New Relic - http://newrelic.com • Scout - http://scoutapp.com These are the two we use and highly recommend. New Relic is really great for giving a high level view of your application. We’re talking at the request response level, including all sorts of fun maths with most time consuming requests, highest standard deviation, etc. It also breaks down requests by where time spent. Like if it’s all in the view, the controller, the database, partials, etc etc Scout is useful for other reasons. While New Relic is good for high level understanding of your application, Scout is a bit more low level. You can use it to collect metrics about your servers, and how well they are running. Memory, CPU, disk space, IO, mysql connection stats, and so on. I really believe these are a great combination, because New Relic can point you in the direction of a problem area, and Scout can better understand what’s contributing to it at a system level.
  • 62. The front page feels slow The front page is taking 10 seconds to load, but we really need it to be loading in under 1 second
  • 63. The primary key seems like it’s increasing rapidly The primary key is at 90% of it’s maximum, up from 80% yesterday, and looks like it’ll run out overnight.
  • 64. IO seems high IO fluctatues up to 90% sometimes, but doesn’t appear to have a negative effect
  • 65.
  • 66. Monitoring Topic
  • 67. How do you know when everything is awful?
  • 68. How would you prefer to know? • Angry tweets • Angry email from your boss • You personally checking everything all the time • An automated system to let you know
  • 70. What to monitor It’s not a problem til it’s a problem
  • 71. Define priority Does it wake someone up?
  • 73. Single point of contact If everything is awful, needs to be a single point of contact. They take point, acknowledge and begin looking into it. If need be, bring on others
  • 74.
  • 75. Vertical scaling Pattern
  • 76. Your app is slow Now what?
  • 77.
  • 81. As always there’s a balance. Remember, it’s a tradeoff to optimize for developer time by vertically scaling. It buys you time to either deal
  • 82. Hipster Stack Antipattern
  • 83.
  • 84. “I read a blog post about how mongo is totally web scale”
  • 86.
  • 87.
  • 88. Remember what’s important for th ebusiness? Do you want to become the expert at <insert technology here>? Is it really the most valuable thing you can be doing?
  • 89. If you’re still going to go hipster... • experiment in branches • understand operational impact • Staging!
  • 90. Test in production Wait, what?
  • 91. Further Reading • Web Operations - John Allspaw and Jesse Robins • Continuous Delivery - Jez Humble and David Farley • “Web Operations for Developers 101” http://www.amazon.com/Web-Operations-Keeping-Data-Time/dp/1449377440/ ref=sr_1_1?s=books&ie=UTF8&qid=1314447411&sr=1-1 http://www.amazon.com/Continuous-Delivery-Deployment-Automation- Addison-Wesley/dp/0321601912/ref=sr_1_4? s=books&ie=UTF8&qid=1314447411&sr=1-4 http://www.paperplanes.de/2011/7/25/web_operations_101_for_developers.html
  • 92. Fin.
  • 93. Want to talk ops? find me here josh@railsmachine @techpickles
  • 94. Do you like these things? • Rails • Operations • Ping Pong • Beer We are hiring