SlideShare una empresa de Scribd logo
1 de 89
Descargar para leer sin conexión
The “Holy Grail” of Dev/Ops
A practical guide to what we’ve done at Cloud Posse
Prepared by Erik Osterman
Cloud Posse, LLC June 2017
Democratization of Information
Story Time
About Me
● Former Director of Cloud Architecture, CBS Interactive in San Francisco
● Ran Operations for TV.com, Metacritic.com, and Clicker.com
● Worked with AWS since 2006 / Private Invite-only Beta
● Advise numerous successful venture backed startups
● Backend Software Developer, Open Source Advocate / Contributor
● Took ~2 years off to travel; visted ~30 countries
This Talk
● ~90 Minutes
● Q&A at the end
● Write question in the chat
● Actionable, practical advice
● Collection of our “Best Practices”
Best Practices
(my) definition: An opinionated & proven strategy with specific tactics to help
achieve the objectives for some overarching goal.
Emulate Giants
Netflix
Google
Spotify
Twitter
Our Best Practices
Organizational
Software Development, CI/CD, Testing, Q&A
Infrastructure, Automation, Orchestration
Logging, Monitoring, Alerting, Escalation, Remediation
Security
Organization
The
...it all starts here
Realize we’re different.
Managers vs Makers - We’re work differently
(Paul Graham - YCombinator Founder)
Makers plan in half-day blocks of time
Managers plan to minimize empty 15 minute slots in their calendar
Interrupts are costly for developers and therefore the business
HumanOps (i.e. not cyborgs)
Humans get tired and stressed, they feel happy and sad.
Human issues are system issues.
Human health impacts business health.
Humans need to switch off and on again (aka sleep).
Humans build and fix systems.
Humans > systems
http://www.humanops.com/
Right Tools for the Job
Email == external communication
(not tasks, threaded conversations, cat pics)
Slack == all internal communications; channels for topics #dogs
Quip == all documentation for transparency
(engineering & business)
Zoom == reliable cross-platform conferencing
Asana == issue tracking
Technical Debt is Real
Tradeoffs are inevitable. Pay the tax now or later.
Later usually means bankruptcy & software rewrites
Includes upgrades, refactoring, optimizations, etc
It’s anything that doesn’t move the product forward
But it will hold the product back
This is not just a software problem.
It’s a business problem too.
...and unavoidable
Software
Development
Software Development
Cloud Native Design - the “12 Factor” Pattern
Stable Code Requires Feature Branching / Pull Requests / Code Reviews
Versioning / Version Pinning
Logging
Local Development Environments
Some Bad Practices
Cowboy Coding, committing to master
Hardcoding secrets, hostnames, paths, etc
“Clever” code is often “complicated” code
Writing un-greppable code, terse variable names,
Inconsistent naming conventions, long functions, and………… you get the point.
Using tabs :P
Some Good Ones….
Strict Linting (e.g. eslint, go lint)
Semantic Versioning (semver)
.editorconfig (tabs or spaces? http://editorconfig.org/)
Seed project repositories
CHANGELOG.md
Best Practice: Open Source Pattern*
Leads to much cleaner code with fewer proprietary dependencies
Fewer proprietary dependencies makes it more reusable across projects
If decide to release, it demonstrates the kind of engineering you do
It works because developer’s ego is on the line to write stuff that doesn’t suck
Pro tip: follow the conventions of your favorite framework or package system
* Does not require that organization releases code as open source
Best Practice:
README.md & CHANGELOG.md
Use well-formed Markdown syntax (.md)
Write “README” files on all your projects. Explain the purpose of the project
Show how to get started and where to look for more information
Document breaking changes & upgrade path in CHANGELOG.md
Pro tip: Use a markdown editor if you’re not familiar with the syntax
Best Practice: Use Makefiles
Provide targets for common usage
E.g. deps, build, run, clean
Include them with all repos
Document targets purpose (##)
Makefile Example
-include .secrets
DB_HOST ?= localhost
## build a docker image
build:
docker build -t cloudposse/test .
## run container
run:
docker run -v $$(pwd):/app 
-e DB_HOST=$(DB_HOST) 
-e DB_PASS=$(DB_PASS) 
-p 8080:80 
cloudposse/test
## test
test:
curl http://localhost:8080/
Best Practice: Local Dev Environments
Onboarding new hires should take minutes not hours
Use fully automated local dev environments
Use same Docker images that will run in staging/production
Bind-mount local volumes to speed up iterations for “live editing”
Pro Tip: Use docker-compose rather than vagrant which is too heavy
Best Practice: Developers write Dockerfiles
Always use alpine:3.5 Base images (be wary of unofficial images)
Declare all ENV in Dockerfile (like function arguments to an OS)
Write as few layers as possible (chain with && )
Version Pin Everything
Use 2-stage build process for thin images (C/C++, Golang)
Best Practice: Branch Protection
Essential for security and stability of your codebase
Require PR approval to merge to master
Force branches to be up-to-date
Disallow commits to master
Restrict to squash+merge
Best Practice: Branch Protection
Best Practice: Pull Requests
Smaller the better; implement exactly 1 feature
Milestones
Use Labels:
Define PULL_REQUEST_TEMPLATE (## what, ## why, ## dependencies)
Use checkboxes for TODOs
….for clean commit histories in master
What a PR should look like....
Best Practice: Follow PRs with Trailer
http://ptsochantaris.github.io/trailer/
Best Practice: Application Logging
Use JSON structured log events
Libraries will efficiently generate/parse
Human readable, highly consistent
Pro tip: use Sentry to aggregate errors+warnings and log them in issue tracker
Sentry.io
Best Practice: Pair Programming
Lose: speed (arguably)
Gain: fewer bugs, business continuity, education, team building/camaraderie
When: implementing complicated features, onboarding, and triaging
Pro tip: Use tmate for instant terminal sharing (https://tmate.io/)
QA
Developers with a focus on test automation
Quality Control
Masters of CI/CD
Best Practice: Bug Blowouts
Set aside 1 day per week to dog food your own app
Prepare test scripts (aka flows) for everyone to follow
Get everyone on board, not just QA.
That means developers, graphic artists, customer support, etc
Monitor logs, submit bugs immediately to issue tracker
Best Practice: Synthetic Testing
Continuous Testing of Critical User Paths
Uses Browser to Automate Tests of Production
Ensure User Registrations, Password Resets, Shopping Carts, and Checkout
work 100% of the time
Pro Tip: Checkout Selenium or PhantomJS
Cloud Native Design
Service-Oriented Architectures (SOA)
Single-purpose Services (aka micro services)
Connected through APIs
Highly Decoupled
12 Factor Pattern
“12 Factor” in a Nutshell
Use Environment Variables for all configuration
(credentials, ports, tuning parameters, etc)
Use Backing Services for everything durable
Write all services as stateless & disposable
Automate all admin tasks
(the rest is meh)
Best Practice: X509 Client Certificates
Use CA to Sign SSL Certificates that perform certain functions
Automatic transport & endpoint security for APIs
Highly scalable - no API requests to validate tokens
Don’t Rely on API tokens which are costly to authenticate and don’t secure the
transport layer
Examples: Kubernetes APIs, etcd
CI/CD
Frequency reduces Difficulty. The more you deploy, the easier it gets.
Latency between check-in and production is risky. It’s like HFT.
Faster delivery improves software development practices
Consistency improves confidence
Ensure applications support same backend schema for adjacent releases
Use feature flags to enable new features of backend schemas
Best Practice: Safe Schema Migrations
Write terse .travis.yaml, circle.yaml, Jenkinsfile
Use the same targets in all projects
Use Makefile to automate build, test
Clone harness repo after git checkout
Example: https://github.com/cloudposse/build-harness
Best Practice: Use a Build Harness
Best Practice: Liberal Tagging
Tag all docker images with multiple tags, in addition to release tags
Let $ref = {branch|tag}
Then, tag
$ref
$ref-$build
$git_hash
#DevOps
It is not…
a) A dedicated team within the organization
b) A job title
c) A sysadmin
d) A skill
e) all the above
The Old Paradigm
What it actually is...
A cross-disciplinary engineering culture
Infrastructure is Code
Automation over toil
A path towards “Serverless” (but we’re still far away!)
Site Reliability Engineering (“SRE”)
Infrastructure as Code
Infrastructure is now 100% API driven
“Best Practices” of Development → Infrastructure
Versioned Infrastructure
Automated Remediations
Use Terraform to fully orchestrate environments
(e.g. DNS, instances, volumes, AutoScaling Groups, Load Balancers, Databases)
S3 remote backends to store state for collaboration and backups
Use modules to encapsulate business logic for consistency / manageability
Version pin modules and dependencies to ensure stability
Best Practice: Automated Orchestration
Best Practice: Tools as Containers
Only local dependency should be docker and maybe make =)
Distribute all other local development tools or dependencies as containers
(e.g. terraform, aws, kops, helm, etc...)
Easier to standardize on one OS
Example: https://github.com/cloudposse/geodesic/
Best Practice: 100% Isolation
Use (1) AWS Account per Stage (E.g. production, staging, dev)
Use (1) VPC per Cluster
Use (1) Dedicated TLD per AWS Account
(e.g. foobar.com, foobar.qa, foobar.org)
Use (1) Single Process Containers for all Apps
Best Practice: Identical Environments
Environments should only differ in size, not shape
“Production”, “Staging”, “Dev” are only labels
Run as many parallel environments as we need
Only manual action is initiating build
E.g. other labels: pentest, loadtest, erik
Pro tip: each environment gets it’s own DNS zone (e.g. erik.cloudposse.org)
What We Want
Reliable - we want things to be online 100% of the time and when things go
wrong, we want them to auto-heal.
Fast - we want to run a site that can scale horizontally as traffic increases
Easy - we shouldn't need rocket scientists to operate it on a day-to-day basis
Affordable - we want it to be easy and cost effective to maintain in the long run
Maintainable - we want to have a development or staging environment that is
identical to production, so we can efficiently work on new versions of the site
without it affecting production
Secure - we don't want to get hacked
Technically, we need this… “Everything”
Horizontal Auto Scaling, Auto Healing, Auto DNS, Auto SSL
Automated deployments and rollbacks, Versioned History
Service Discovery & Load Balancing
Batch Job, Scheduled Job Execution
Storage/Volume Orchestration
...out of the box
Best Practice: Use Kubernetes (sometimes)
Ideally suited for microservices architectures, larger engineering teams
“Infrastructure as Code” - write documents that describe you microservices
(Pods ~ VMs, ReplicaSets ~ clusters, Services ~ Load Balancers)
Comes with Everything out-of-the-box
Cons: more complex to get started, difficult to triage issues, requires SME
Pro tip: Use kops to spin up clusters automatically in AWS and GCE
Kubernetes Dashboard
Best Practice: Use Elastic Beanstalk
Ideally suited for monolithic architectures
Comes with almost Everything out-of-the-box
Supports instances inside private VPC with root SSH access
Formal process for promoting code to production / automatic rollbacks
Pro tip: Use terraform to spin up beanstalk clusters automatically in AWS
Elastic Beanstalk
Configuration Management
Immutablevs Mutable
Declarative vs Imperative
“WYSIWYG”
Best Practice: Immutable Containers/AMIs
Like “Burning” a copy of your code in an image
Easy to know exactly what is running
Fast to deploy and rollback
Use Docker containers for applications
Use something like CoreOS for underlying host (~dom0)
Best Practice: Imperative Infrastructure
“Give me a load balancer, 2 filesystems, 2 GB ram, 4 CPUs, 4 instances”
There’s no guess work about what is output
Compatible with legacy architectures
There’s less magic
Monitoring
Application - Synthetic Testing
Infrastructure
Real-User Monitoring (RUM)
SLI
Systems don't have feelings. They only have SLAs.
Best Practice: Team Dashboards
Display Service Level Indicators (~ KPIs) relevant for specific teams
Create dashboards for specific services like Kafka and Zookeeper
First place to look when triaging issues
Pro tip: Use Datadog dashboards with namespace filtering on clusters
Sample Dashboard Overview
Alerting
Alert Fatigue == Human Fatigue
Dashboards > Alerts > Email
Human health impacts business health.
Budgets
Metrics driven; not log events
Alerts need to be actionable - with links to documentation
PROBLEM: NAGIOS is EVIL
Best Practice: Actionable Alerts
Escalation & Remediation
Automate as much as possible, escalate to a human as a last resort.
KPI~SLI / SLO / SLA
On-call Engineers
PagerDuty - Manage Calendars and Phone/SMS Escalations
Best Practice: #OCE Slack Channel
One channel to reach engineers
Searchable history of events and conversations
Use topic to announce who is on-call
Linked Google Calendar with Relevant Events (E.g. Customer Demo Calendar)
Best Practice: Post-Mortems
Kill the shame game. Human issues are system issues.
5 Whys - Root Cause Analysis (“RCA”)
Use Consistent Template (KISS)
Weekly Retrospectives with past OCEs and Stakeholders
Documented in Quip → Instantly Searchable
Pro Tip: Check out how Google does it:
https://landing.google.com/sre/book/chapters/postmortem-culture.html
Security
100% Security Cannot Be Achieved
Assume systems are insecure
Devalue credentials with MFA
What not to do...
1. Store secrets in git repository
2. Hardcode secrets in configurations
3. Write them in plain-text
4. Manually distributed them
5. Reuse/share keys across users and apps
6. Build homegrown systems to protect secrets
(* unless you’re Netflix, Hashicorp or Google)
...but you already knew that!
Best Practice: Beyond Corp Model
Enterprise zero-trust security model used by Google
Shift access controls from the network perimeter to individual devices/users
Allow employees to work more securely from any location
Do not rely on traditional VPNs
Best Practice: Identity-Aware Proxy (IAP).
Protect internal services using an IAP
Integrates cleanly with your SSO provide
MFA
Pro tip: Use the Bitly OAuth2 Proxy to add auth layer to any service
Best Practice: Bastion Host
Centralized point for accessing systems
Session logs, Slack Login Notifications
Require MFA to authenticate
Disable proxy mode and TCP socket forwarding
Use bastion only for triage, not administration (because that’s scripted!)
Pro Tip: Use Duo Push Notifications + Geofencing
Best Practice: Login Justifications
Best Practice: SSH Key Management
2 options - Github Public Key API or Signed Certificates
● You can’t protect the private key
● You can add multiple factors (a.k.a. MFA)
● Our Solution
○ Use Github Public Key API to distribute public keys
https://github.com/cloudposse/github-authorized-keys
○ Use Duo for MFA Push Notifications + Geofencing
https://github.com/cloudposse/bastion
Pro tip: Checkout Bless by Netflix
Duo Slack Integration and Dashboard
Best Practice: SSM Scripted Remediations
Use SSM to execute commands in parallel across machines
(don’t use parallel ssh since that is harder to audit)
Full audit logs of command and output
Use IAM roles to restrict execution
Pro tip: use the aws cli to trigger remediations on the command line
Best Practice: Federated Accounts
Reduce the blast radius when things explode
Use one account per environment: dev, staging, production
Use a one account for billing aggregation, IAM federation
Assumed Roles (e.g. read-only, admin, dba)
MFA required to assume roles - to devalue credentials
Pro Tip: Use STS API with MFA to generate short lived AWS credentials
Example: https://github.com/cloudposse/aws-assumed-role
AWS
Best Practice: AWS Secrets (Client-side)
Client Side (e.g. Terraform, AWS Cli)
● IAM User Account Access Keys (never shared!)
● Access Keys only permit Assume Role+MFA
● Assumed Roles (limit scope)
● Temporary Sessions Tokens with STS (expire after 1 hour)
● MFA (devalue credentials)
Solution: https://github.com/cloudposse/aws-assumed-role
Best Practice: AWS Secrets (Server-side)
Dynamic, Auto Rotating Credentials for Server Applications
Never ever hardcode AWS credentials on EC2 instances
Server Side (e.g. EC2 Instance, Docker Container)
● IAM Instance Profiles with Assumed Roles
● Use Kube2IAM with Kubernetes (kops)
https://github.com/cloudposse/charts/tree/master/incubator/kube2iam-kops
○ Temporary AWS credentials
○ Drop-in Compatiblity with all official AWS client library
Best Practice: Bootstrap Secrets
Secrets you need to provision new clusters on AWS...
● Run terraform inside of Container
● Private S3 Configuration Bucket
● Encrypted Bucket Objects
● Mount S3 Bucket inside container (S3FS)
● Use /dev/shm for caching
Geodesic: https://github.com/cloudposse/geodesic
Best Practice: Password Managers
Store Organizational Secrets in Password Manager
(webhook urls, master account credentials, shared MFA)
Use Vaults specific to some shared objective (e.g. team)
Require MFA for decryption
Avoid Shared Credentials as much as possible (this is a last resort)
SSO > Shared Passwords
Pro tip: Use 1Password for Teams. Abandon all other password managers.
Best Practice: Avoid Password Rules
They don't work
They frustrate average users
Penalize people that use real random password generators
They are often computationally weaker → vulnerable to brute force attacks
https://blog.codinghorror.com/password-rules-are-bullshit/
Best Practice: Avoid Password Rules
SaaS Cocktails What We Use
The Bible
https://landing.google.com/sre/book.html
__EOF__
Erik Osterman, Founder
Cloud Posse, LLC
hello@cloudposse.com
https://cloudposse.com/
https://github.com/cloudposse/

Más contenido relacionado

La actualidad más candente

(STG205) Secure Content Delivery Using Amazon CloudFront
(STG205) Secure Content Delivery Using Amazon CloudFront(STG205) Secure Content Delivery Using Amazon CloudFront
(STG205) Secure Content Delivery Using Amazon CloudFrontAmazon Web Services
 
MySQL administration in Amazon RDS
MySQL administration in Amazon RDSMySQL administration in Amazon RDS
MySQL administration in Amazon RDSPythian
 
Running Docker clusters on AWS (November 2016)
Running Docker clusters on AWS (November 2016)Running Docker clusters on AWS (November 2016)
Running Docker clusters on AWS (November 2016)Julien SIMON
 
Aws ebs snapshot with iam cross account access
Aws ebs snapshot with iam cross account accessAws ebs snapshot with iam cross account access
Aws ebs snapshot with iam cross account accessNaoya Hashimoto
 
Aws meetup ssm
Aws meetup ssmAws meetup ssm
Aws meetup ssmAdam Book
 
(SEC404) Incident Response in the Cloud | AWS re:Invent 2014
(SEC404) Incident Response in the Cloud | AWS re:Invent 2014(SEC404) Incident Response in the Cloud | AWS re:Invent 2014
(SEC404) Incident Response in the Cloud | AWS re:Invent 2014Amazon Web Services
 
Aws meetup building_lambda
Aws meetup building_lambdaAws meetup building_lambda
Aws meetup building_lambdaAdam Book
 
Getting Started with Amazon ECS: Run Docker Containers on AWS
Getting Started with Amazon ECS: Run Docker Containers on AWSGetting Started with Amazon ECS: Run Docker Containers on AWS
Getting Started with Amazon ECS: Run Docker Containers on AWSTung Nguyen
 
(WEB203) Building a Website That Costs Pennies to Operate | AWS re:Invent 2014
(WEB203) Building a Website That Costs Pennies to Operate | AWS re:Invent 2014(WEB203) Building a Website That Costs Pennies to Operate | AWS re:Invent 2014
(WEB203) Building a Website That Costs Pennies to Operate | AWS re:Invent 2014Amazon Web Services
 
Expand Your ColdFusion App Power with AWS
Expand Your ColdFusion App Power with AWSExpand Your ColdFusion App Power with AWS
Expand Your ColdFusion App Power with AWSColdFusionConference
 
Cloudformation & VPC, EC2, RDS
Cloudformation & VPC, EC2, RDSCloudformation & VPC, EC2, RDS
Cloudformation & VPC, EC2, RDSCan Abacıgil
 
Building serverless apps with Node.js
Building serverless apps with Node.jsBuilding serverless apps with Node.js
Building serverless apps with Node.jsJulien SIMON
 
How to scale to 100k users using Windows Azure
How to scale to 100k users using Windows AzureHow to scale to 100k users using Windows Azure
How to scale to 100k users using Windows AzureIonut Antiu
 
Infrastructure as Code: Manage your Architecture with Git
Infrastructure as Code: Manage your Architecture with GitInfrastructure as Code: Manage your Architecture with Git
Infrastructure as Code: Manage your Architecture with GitDanilo Poccia
 
Installing WordPress on AWS
Installing WordPress on AWSInstalling WordPress on AWS
Installing WordPress on AWSManish Jain
 

La actualidad más candente (20)

(STG205) Secure Content Delivery Using Amazon CloudFront
(STG205) Secure Content Delivery Using Amazon CloudFront(STG205) Secure Content Delivery Using Amazon CloudFront
(STG205) Secure Content Delivery Using Amazon CloudFront
 
MySQL administration in Amazon RDS
MySQL administration in Amazon RDSMySQL administration in Amazon RDS
MySQL administration in Amazon RDS
 
Running Docker clusters on AWS (November 2016)
Running Docker clusters on AWS (November 2016)Running Docker clusters on AWS (November 2016)
Running Docker clusters on AWS (November 2016)
 
Azure from scratch part 4
Azure from scratch part 4Azure from scratch part 4
Azure from scratch part 4
 
Aws ebs snapshot with iam cross account access
Aws ebs snapshot with iam cross account accessAws ebs snapshot with iam cross account access
Aws ebs snapshot with iam cross account access
 
Aws meetup ssm
Aws meetup ssmAws meetup ssm
Aws meetup ssm
 
Containers on AWS
Containers on AWSContainers on AWS
Containers on AWS
 
AWS Security
AWS SecurityAWS Security
AWS Security
 
Development Workflows on AWS
Development Workflows on AWSDevelopment Workflows on AWS
Development Workflows on AWS
 
Amazon ECS Deep Dive
Amazon ECS Deep DiveAmazon ECS Deep Dive
Amazon ECS Deep Dive
 
(SEC404) Incident Response in the Cloud | AWS re:Invent 2014
(SEC404) Incident Response in the Cloud | AWS re:Invent 2014(SEC404) Incident Response in the Cloud | AWS re:Invent 2014
(SEC404) Incident Response in the Cloud | AWS re:Invent 2014
 
Aws meetup building_lambda
Aws meetup building_lambdaAws meetup building_lambda
Aws meetup building_lambda
 
Getting Started with Amazon ECS: Run Docker Containers on AWS
Getting Started with Amazon ECS: Run Docker Containers on AWSGetting Started with Amazon ECS: Run Docker Containers on AWS
Getting Started with Amazon ECS: Run Docker Containers on AWS
 
(WEB203) Building a Website That Costs Pennies to Operate | AWS re:Invent 2014
(WEB203) Building a Website That Costs Pennies to Operate | AWS re:Invent 2014(WEB203) Building a Website That Costs Pennies to Operate | AWS re:Invent 2014
(WEB203) Building a Website That Costs Pennies to Operate | AWS re:Invent 2014
 
Expand Your ColdFusion App Power with AWS
Expand Your ColdFusion App Power with AWSExpand Your ColdFusion App Power with AWS
Expand Your ColdFusion App Power with AWS
 
Cloudformation & VPC, EC2, RDS
Cloudformation & VPC, EC2, RDSCloudformation & VPC, EC2, RDS
Cloudformation & VPC, EC2, RDS
 
Building serverless apps with Node.js
Building serverless apps with Node.jsBuilding serverless apps with Node.js
Building serverless apps with Node.js
 
How to scale to 100k users using Windows Azure
How to scale to 100k users using Windows AzureHow to scale to 100k users using Windows Azure
How to scale to 100k users using Windows Azure
 
Infrastructure as Code: Manage your Architecture with Git
Infrastructure as Code: Manage your Architecture with GitInfrastructure as Code: Manage your Architecture with Git
Infrastructure as Code: Manage your Architecture with Git
 
Installing WordPress on AWS
Installing WordPress on AWSInstalling WordPress on AWS
Installing WordPress on AWS
 

Similar a The "Holy Grail" of Dev/Ops

AWS Webcast - Build Agile Applications in AWS Cloud for Government
AWS Webcast - Build Agile Applications in AWS Cloud for GovernmentAWS Webcast - Build Agile Applications in AWS Cloud for Government
AWS Webcast - Build Agile Applications in AWS Cloud for GovernmentAmazon Web Services
 
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...Amazon Web Services
 
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...Emerson Eduardo Rodrigues Von Staffen
 
Class 7: Introduction to web technology entrepreneurship
Class 7: Introduction to web technology entrepreneurshipClass 7: Introduction to web technology entrepreneurship
Class 7: Introduction to web technology entrepreneurshipallanchao
 
Integrating-Cloud-Development-Security-And-Operations.pdf
Integrating-Cloud-Development-Security-And-Operations.pdfIntegrating-Cloud-Development-Security-And-Operations.pdf
Integrating-Cloud-Development-Security-And-Operations.pdfAmazon Web Services
 
Attacking Pipelines--Security meets Continuous Delivery
Attacking Pipelines--Security meets Continuous DeliveryAttacking Pipelines--Security meets Continuous Delivery
Attacking Pipelines--Security meets Continuous DeliveryJames Wickett
 
Apcera Case Study: The selection of the Go language
Apcera Case Study: The selection of the Go languageApcera Case Study: The selection of the Go language
Apcera Case Study: The selection of the Go languageDerek Collison
 
AWS Summit Auckland - Application Delivery Patterns for Developers
AWS Summit Auckland - Application Delivery Patterns for DevelopersAWS Summit Auckland - Application Delivery Patterns for Developers
AWS Summit Auckland - Application Delivery Patterns for DevelopersAmazon Web Services
 
Class 6: Introduction to web technology entrepreneurship
Class 6: Introduction to web technology entrepreneurshipClass 6: Introduction to web technology entrepreneurship
Class 6: Introduction to web technology entrepreneurshipallanchao
 
Experiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure teamExperiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure teamBrian Benz
 
AWS Webcast - Build Agile Applications in AWS Cloud for Government
AWS Webcast - Build Agile Applications in AWS Cloud for GovernmentAWS Webcast - Build Agile Applications in AWS Cloud for Government
AWS Webcast - Build Agile Applications in AWS Cloud for GovernmentAmazon Web Services
 
Muraliupdatedpersonal091215
Muraliupdatedpersonal091215Muraliupdatedpersonal091215
Muraliupdatedpersonal091215Murali Krishna R
 
DEV326_DevOps Essentials An Introductory Workshop on CICD Practices
DEV326_DevOps Essentials An Introductory Workshop on CICD PracticesDEV326_DevOps Essentials An Introductory Workshop on CICD Practices
DEV326_DevOps Essentials An Introductory Workshop on CICD PracticesAmazon Web Services
 
DevOps and Decoys How to Build a Successful Microsoft DevOps Including the Data
DevOps and Decoys  How to Build a Successful Microsoft DevOps Including the DataDevOps and Decoys  How to Build a Successful Microsoft DevOps Including the Data
DevOps and Decoys How to Build a Successful Microsoft DevOps Including the DataKellyn Pot'Vin-Gorman
 
Anatomy of a Build Pipeline
Anatomy of a Build PipelineAnatomy of a Build Pipeline
Anatomy of a Build PipelineSamuel Brown
 
DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.Vlad Fedosov
 
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...Janusz Nowak
 
Platform engineering 101
Platform engineering 101Platform engineering 101
Platform engineering 101Sander Knape
 
Building Scalable Development Environments
Building Scalable Development EnvironmentsBuilding Scalable Development Environments
Building Scalable Development EnvironmentsShahar Evron
 

Similar a The "Holy Grail" of Dev/Ops (20)

AWS Webcast - Build Agile Applications in AWS Cloud for Government
AWS Webcast - Build Agile Applications in AWS Cloud for GovernmentAWS Webcast - Build Agile Applications in AWS Cloud for Government
AWS Webcast - Build Agile Applications in AWS Cloud for Government
 
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
 
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
 
Class 7: Introduction to web technology entrepreneurship
Class 7: Introduction to web technology entrepreneurshipClass 7: Introduction to web technology entrepreneurship
Class 7: Introduction to web technology entrepreneurship
 
Integrating-Cloud-Development-Security-And-Operations.pdf
Integrating-Cloud-Development-Security-And-Operations.pdfIntegrating-Cloud-Development-Security-And-Operations.pdf
Integrating-Cloud-Development-Security-And-Operations.pdf
 
Attacking Pipelines--Security meets Continuous Delivery
Attacking Pipelines--Security meets Continuous DeliveryAttacking Pipelines--Security meets Continuous Delivery
Attacking Pipelines--Security meets Continuous Delivery
 
Apcera Case Study: The selection of the Go language
Apcera Case Study: The selection of the Go languageApcera Case Study: The selection of the Go language
Apcera Case Study: The selection of the Go language
 
AWS Summit Auckland - Application Delivery Patterns for Developers
AWS Summit Auckland - Application Delivery Patterns for DevelopersAWS Summit Auckland - Application Delivery Patterns for Developers
AWS Summit Auckland - Application Delivery Patterns for Developers
 
Class 6: Introduction to web technology entrepreneurship
Class 6: Introduction to web technology entrepreneurshipClass 6: Introduction to web technology entrepreneurship
Class 6: Introduction to web technology entrepreneurship
 
Experiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure teamExperiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure team
 
AWS Webcast - Build Agile Applications in AWS Cloud for Government
AWS Webcast - Build Agile Applications in AWS Cloud for GovernmentAWS Webcast - Build Agile Applications in AWS Cloud for Government
AWS Webcast - Build Agile Applications in AWS Cloud for Government
 
Muraliupdatedpersonal091215
Muraliupdatedpersonal091215Muraliupdatedpersonal091215
Muraliupdatedpersonal091215
 
DEV326_DevOps Essentials An Introductory Workshop on CICD Practices
DEV326_DevOps Essentials An Introductory Workshop on CICD PracticesDEV326_DevOps Essentials An Introductory Workshop on CICD Practices
DEV326_DevOps Essentials An Introductory Workshop on CICD Practices
 
DevOps and Decoys How to Build a Successful Microsoft DevOps Including the Data
DevOps and Decoys  How to Build a Successful Microsoft DevOps Including the DataDevOps and Decoys  How to Build a Successful Microsoft DevOps Including the Data
DevOps and Decoys How to Build a Successful Microsoft DevOps Including the Data
 
Anatomy of a Build Pipeline
Anatomy of a Build PipelineAnatomy of a Build Pipeline
Anatomy of a Build Pipeline
 
DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.
 
North east user group tour
North east user group tourNorth east user group tour
North east user group tour
 
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
Continues Integration and Continuous Delivery with Azure DevOps - Deploy Anyt...
 
Platform engineering 101
Platform engineering 101Platform engineering 101
Platform engineering 101
 
Building Scalable Development Environments
Building Scalable Development EnvironmentsBuilding Scalable Development Environments
Building Scalable Development Environments
 

Más de Erik Osterman

Unlimited Staging Environments on Kubernetes
Unlimited Staging Environments on KubernetesUnlimited Staging Environments on Kubernetes
Unlimited Staging Environments on KubernetesErik Osterman
 
Docker Demystified for SB JUG
Docker Demystified for SB JUGDocker Demystified for SB JUG
Docker Demystified for SB JUGErik Osterman
 
An Ensemble Core with Docker - Solving a Real Pain in the PaaS
An Ensemble Core with Docker - Solving a Real Pain in the PaaS An Ensemble Core with Docker - Solving a Real Pain in the PaaS
An Ensemble Core with Docker - Solving a Real Pain in the PaaS Erik Osterman
 
Docker Demystified - Virtual VMs without the Fat
Docker Demystified - Virtual VMs without the FatDocker Demystified - Virtual VMs without the Fat
Docker Demystified - Virtual VMs without the FatErik Osterman
 
Speeding up Page Load Times by Using the Starling Queue Server
Speeding up Page Load Times by Using the Starling Queue ServerSpeeding up Page Load Times by Using the Starling Queue Server
Speeding up Page Load Times by Using the Starling Queue ServerErik Osterman
 
Speeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using StarlingSpeeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using StarlingErik Osterman
 
RightScale User Conference: Why RightScale?
RightScale User Conference: Why RightScale?RightScale User Conference: Why RightScale?
RightScale User Conference: Why RightScale?Erik Osterman
 

Más de Erik Osterman (7)

Unlimited Staging Environments on Kubernetes
Unlimited Staging Environments on KubernetesUnlimited Staging Environments on Kubernetes
Unlimited Staging Environments on Kubernetes
 
Docker Demystified for SB JUG
Docker Demystified for SB JUGDocker Demystified for SB JUG
Docker Demystified for SB JUG
 
An Ensemble Core with Docker - Solving a Real Pain in the PaaS
An Ensemble Core with Docker - Solving a Real Pain in the PaaS An Ensemble Core with Docker - Solving a Real Pain in the PaaS
An Ensemble Core with Docker - Solving a Real Pain in the PaaS
 
Docker Demystified - Virtual VMs without the Fat
Docker Demystified - Virtual VMs without the FatDocker Demystified - Virtual VMs without the Fat
Docker Demystified - Virtual VMs without the Fat
 
Speeding up Page Load Times by Using the Starling Queue Server
Speeding up Page Load Times by Using the Starling Queue ServerSpeeding up Page Load Times by Using the Starling Queue Server
Speeding up Page Load Times by Using the Starling Queue Server
 
Speeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using StarlingSpeeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using Starling
 
RightScale User Conference: Why RightScale?
RightScale User Conference: Why RightScale?RightScale User Conference: Why RightScale?
RightScale User Conference: Why RightScale?
 

Último

AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01KreezheaRecto
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Christo Ananth
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 

Último (20)

AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 

The "Holy Grail" of Dev/Ops

  • 1. The “Holy Grail” of Dev/Ops A practical guide to what we’ve done at Cloud Posse Prepared by Erik Osterman Cloud Posse, LLC June 2017
  • 4. About Me ● Former Director of Cloud Architecture, CBS Interactive in San Francisco ● Ran Operations for TV.com, Metacritic.com, and Clicker.com ● Worked with AWS since 2006 / Private Invite-only Beta ● Advise numerous successful venture backed startups ● Backend Software Developer, Open Source Advocate / Contributor ● Took ~2 years off to travel; visted ~30 countries
  • 5. This Talk ● ~90 Minutes ● Q&A at the end ● Write question in the chat ● Actionable, practical advice ● Collection of our “Best Practices”
  • 6. Best Practices (my) definition: An opinionated & proven strategy with specific tactics to help achieve the objectives for some overarching goal.
  • 8. Our Best Practices Organizational Software Development, CI/CD, Testing, Q&A Infrastructure, Automation, Orchestration Logging, Monitoring, Alerting, Escalation, Remediation Security
  • 10. Realize we’re different. Managers vs Makers - We’re work differently (Paul Graham - YCombinator Founder) Makers plan in half-day blocks of time Managers plan to minimize empty 15 minute slots in their calendar Interrupts are costly for developers and therefore the business
  • 11. HumanOps (i.e. not cyborgs) Humans get tired and stressed, they feel happy and sad. Human issues are system issues. Human health impacts business health. Humans need to switch off and on again (aka sleep). Humans build and fix systems. Humans > systems http://www.humanops.com/
  • 12. Right Tools for the Job Email == external communication (not tasks, threaded conversations, cat pics) Slack == all internal communications; channels for topics #dogs Quip == all documentation for transparency (engineering & business) Zoom == reliable cross-platform conferencing Asana == issue tracking
  • 13. Technical Debt is Real Tradeoffs are inevitable. Pay the tax now or later. Later usually means bankruptcy & software rewrites Includes upgrades, refactoring, optimizations, etc It’s anything that doesn’t move the product forward But it will hold the product back This is not just a software problem. It’s a business problem too. ...and unavoidable
  • 15. Software Development Cloud Native Design - the “12 Factor” Pattern Stable Code Requires Feature Branching / Pull Requests / Code Reviews Versioning / Version Pinning Logging Local Development Environments
  • 16. Some Bad Practices Cowboy Coding, committing to master Hardcoding secrets, hostnames, paths, etc “Clever” code is often “complicated” code Writing un-greppable code, terse variable names, Inconsistent naming conventions, long functions, and………… you get the point. Using tabs :P
  • 17. Some Good Ones…. Strict Linting (e.g. eslint, go lint) Semantic Versioning (semver) .editorconfig (tabs or spaces? http://editorconfig.org/) Seed project repositories CHANGELOG.md
  • 18. Best Practice: Open Source Pattern* Leads to much cleaner code with fewer proprietary dependencies Fewer proprietary dependencies makes it more reusable across projects If decide to release, it demonstrates the kind of engineering you do It works because developer’s ego is on the line to write stuff that doesn’t suck Pro tip: follow the conventions of your favorite framework or package system * Does not require that organization releases code as open source
  • 19. Best Practice: README.md & CHANGELOG.md Use well-formed Markdown syntax (.md) Write “README” files on all your projects. Explain the purpose of the project Show how to get started and where to look for more information Document breaking changes & upgrade path in CHANGELOG.md Pro tip: Use a markdown editor if you’re not familiar with the syntax
  • 20. Best Practice: Use Makefiles Provide targets for common usage E.g. deps, build, run, clean Include them with all repos Document targets purpose (##)
  • 21. Makefile Example -include .secrets DB_HOST ?= localhost ## build a docker image build: docker build -t cloudposse/test . ## run container run: docker run -v $$(pwd):/app -e DB_HOST=$(DB_HOST) -e DB_PASS=$(DB_PASS) -p 8080:80 cloudposse/test ## test test: curl http://localhost:8080/
  • 22. Best Practice: Local Dev Environments Onboarding new hires should take minutes not hours Use fully automated local dev environments Use same Docker images that will run in staging/production Bind-mount local volumes to speed up iterations for “live editing” Pro Tip: Use docker-compose rather than vagrant which is too heavy
  • 23. Best Practice: Developers write Dockerfiles Always use alpine:3.5 Base images (be wary of unofficial images) Declare all ENV in Dockerfile (like function arguments to an OS) Write as few layers as possible (chain with && ) Version Pin Everything Use 2-stage build process for thin images (C/C++, Golang)
  • 24. Best Practice: Branch Protection Essential for security and stability of your codebase Require PR approval to merge to master Force branches to be up-to-date Disallow commits to master Restrict to squash+merge
  • 25. Best Practice: Branch Protection
  • 26. Best Practice: Pull Requests Smaller the better; implement exactly 1 feature Milestones Use Labels: Define PULL_REQUEST_TEMPLATE (## what, ## why, ## dependencies) Use checkboxes for TODOs ….for clean commit histories in master
  • 27. What a PR should look like....
  • 28. Best Practice: Follow PRs with Trailer http://ptsochantaris.github.io/trailer/
  • 29. Best Practice: Application Logging Use JSON structured log events Libraries will efficiently generate/parse Human readable, highly consistent Pro tip: use Sentry to aggregate errors+warnings and log them in issue tracker
  • 31. Best Practice: Pair Programming Lose: speed (arguably) Gain: fewer bugs, business continuity, education, team building/camaraderie When: implementing complicated features, onboarding, and triaging Pro tip: Use tmate for instant terminal sharing (https://tmate.io/)
  • 32. QA Developers with a focus on test automation Quality Control Masters of CI/CD
  • 33. Best Practice: Bug Blowouts Set aside 1 day per week to dog food your own app Prepare test scripts (aka flows) for everyone to follow Get everyone on board, not just QA. That means developers, graphic artists, customer support, etc Monitor logs, submit bugs immediately to issue tracker
  • 34. Best Practice: Synthetic Testing Continuous Testing of Critical User Paths Uses Browser to Automate Tests of Production Ensure User Registrations, Password Resets, Shopping Carts, and Checkout work 100% of the time Pro Tip: Checkout Selenium or PhantomJS
  • 35. Cloud Native Design Service-Oriented Architectures (SOA) Single-purpose Services (aka micro services) Connected through APIs Highly Decoupled 12 Factor Pattern
  • 36. “12 Factor” in a Nutshell Use Environment Variables for all configuration (credentials, ports, tuning parameters, etc) Use Backing Services for everything durable Write all services as stateless & disposable Automate all admin tasks (the rest is meh)
  • 37. Best Practice: X509 Client Certificates Use CA to Sign SSL Certificates that perform certain functions Automatic transport & endpoint security for APIs Highly scalable - no API requests to validate tokens Don’t Rely on API tokens which are costly to authenticate and don’t secure the transport layer Examples: Kubernetes APIs, etcd
  • 38. CI/CD Frequency reduces Difficulty. The more you deploy, the easier it gets. Latency between check-in and production is risky. It’s like HFT. Faster delivery improves software development practices Consistency improves confidence
  • 39. Ensure applications support same backend schema for adjacent releases Use feature flags to enable new features of backend schemas Best Practice: Safe Schema Migrations
  • 40. Write terse .travis.yaml, circle.yaml, Jenkinsfile Use the same targets in all projects Use Makefile to automate build, test Clone harness repo after git checkout Example: https://github.com/cloudposse/build-harness Best Practice: Use a Build Harness
  • 41. Best Practice: Liberal Tagging Tag all docker images with multiple tags, in addition to release tags Let $ref = {branch|tag} Then, tag $ref $ref-$build $git_hash
  • 43.
  • 44. It is not… a) A dedicated team within the organization b) A job title c) A sysadmin d) A skill e) all the above
  • 46. What it actually is... A cross-disciplinary engineering culture Infrastructure is Code Automation over toil A path towards “Serverless” (but we’re still far away!) Site Reliability Engineering (“SRE”)
  • 47. Infrastructure as Code Infrastructure is now 100% API driven “Best Practices” of Development → Infrastructure Versioned Infrastructure Automated Remediations
  • 48. Use Terraform to fully orchestrate environments (e.g. DNS, instances, volumes, AutoScaling Groups, Load Balancers, Databases) S3 remote backends to store state for collaboration and backups Use modules to encapsulate business logic for consistency / manageability Version pin modules and dependencies to ensure stability Best Practice: Automated Orchestration
  • 49. Best Practice: Tools as Containers Only local dependency should be docker and maybe make =) Distribute all other local development tools or dependencies as containers (e.g. terraform, aws, kops, helm, etc...) Easier to standardize on one OS Example: https://github.com/cloudposse/geodesic/
  • 50. Best Practice: 100% Isolation Use (1) AWS Account per Stage (E.g. production, staging, dev) Use (1) VPC per Cluster Use (1) Dedicated TLD per AWS Account (e.g. foobar.com, foobar.qa, foobar.org) Use (1) Single Process Containers for all Apps
  • 51. Best Practice: Identical Environments Environments should only differ in size, not shape “Production”, “Staging”, “Dev” are only labels Run as many parallel environments as we need Only manual action is initiating build E.g. other labels: pentest, loadtest, erik Pro tip: each environment gets it’s own DNS zone (e.g. erik.cloudposse.org)
  • 52. What We Want Reliable - we want things to be online 100% of the time and when things go wrong, we want them to auto-heal. Fast - we want to run a site that can scale horizontally as traffic increases Easy - we shouldn't need rocket scientists to operate it on a day-to-day basis Affordable - we want it to be easy and cost effective to maintain in the long run Maintainable - we want to have a development or staging environment that is identical to production, so we can efficiently work on new versions of the site without it affecting production Secure - we don't want to get hacked
  • 53. Technically, we need this… “Everything” Horizontal Auto Scaling, Auto Healing, Auto DNS, Auto SSL Automated deployments and rollbacks, Versioned History Service Discovery & Load Balancing Batch Job, Scheduled Job Execution Storage/Volume Orchestration ...out of the box
  • 54. Best Practice: Use Kubernetes (sometimes) Ideally suited for microservices architectures, larger engineering teams “Infrastructure as Code” - write documents that describe you microservices (Pods ~ VMs, ReplicaSets ~ clusters, Services ~ Load Balancers) Comes with Everything out-of-the-box Cons: more complex to get started, difficult to triage issues, requires SME Pro tip: Use kops to spin up clusters automatically in AWS and GCE
  • 56. Best Practice: Use Elastic Beanstalk Ideally suited for monolithic architectures Comes with almost Everything out-of-the-box Supports instances inside private VPC with root SSH access Formal process for promoting code to production / automatic rollbacks Pro tip: Use terraform to spin up beanstalk clusters automatically in AWS
  • 59. Best Practice: Immutable Containers/AMIs Like “Burning” a copy of your code in an image Easy to know exactly what is running Fast to deploy and rollback Use Docker containers for applications Use something like CoreOS for underlying host (~dom0)
  • 60. Best Practice: Imperative Infrastructure “Give me a load balancer, 2 filesystems, 2 GB ram, 4 CPUs, 4 instances” There’s no guess work about what is output Compatible with legacy architectures There’s less magic
  • 61. Monitoring Application - Synthetic Testing Infrastructure Real-User Monitoring (RUM) SLI Systems don't have feelings. They only have SLAs.
  • 62. Best Practice: Team Dashboards Display Service Level Indicators (~ KPIs) relevant for specific teams Create dashboards for specific services like Kafka and Zookeeper First place to look when triaging issues Pro tip: Use Datadog dashboards with namespace filtering on clusters
  • 64. Alerting Alert Fatigue == Human Fatigue Dashboards > Alerts > Email Human health impacts business health. Budgets Metrics driven; not log events Alerts need to be actionable - with links to documentation
  • 67. Escalation & Remediation Automate as much as possible, escalate to a human as a last resort. KPI~SLI / SLO / SLA On-call Engineers PagerDuty - Manage Calendars and Phone/SMS Escalations
  • 68. Best Practice: #OCE Slack Channel One channel to reach engineers Searchable history of events and conversations Use topic to announce who is on-call Linked Google Calendar with Relevant Events (E.g. Customer Demo Calendar)
  • 69. Best Practice: Post-Mortems Kill the shame game. Human issues are system issues. 5 Whys - Root Cause Analysis (“RCA”) Use Consistent Template (KISS) Weekly Retrospectives with past OCEs and Stakeholders Documented in Quip → Instantly Searchable Pro Tip: Check out how Google does it: https://landing.google.com/sre/book/chapters/postmortem-culture.html
  • 70. Security 100% Security Cannot Be Achieved Assume systems are insecure Devalue credentials with MFA
  • 71. What not to do... 1. Store secrets in git repository 2. Hardcode secrets in configurations 3. Write them in plain-text 4. Manually distributed them 5. Reuse/share keys across users and apps 6. Build homegrown systems to protect secrets (* unless you’re Netflix, Hashicorp or Google) ...but you already knew that!
  • 72. Best Practice: Beyond Corp Model Enterprise zero-trust security model used by Google Shift access controls from the network perimeter to individual devices/users Allow employees to work more securely from any location Do not rely on traditional VPNs
  • 73. Best Practice: Identity-Aware Proxy (IAP). Protect internal services using an IAP Integrates cleanly with your SSO provide MFA Pro tip: Use the Bitly OAuth2 Proxy to add auth layer to any service
  • 74. Best Practice: Bastion Host Centralized point for accessing systems Session logs, Slack Login Notifications Require MFA to authenticate Disable proxy mode and TCP socket forwarding Use bastion only for triage, not administration (because that’s scripted!) Pro Tip: Use Duo Push Notifications + Geofencing
  • 75. Best Practice: Login Justifications
  • 76. Best Practice: SSH Key Management 2 options - Github Public Key API or Signed Certificates ● You can’t protect the private key ● You can add multiple factors (a.k.a. MFA) ● Our Solution ○ Use Github Public Key API to distribute public keys https://github.com/cloudposse/github-authorized-keys ○ Use Duo for MFA Push Notifications + Geofencing https://github.com/cloudposse/bastion Pro tip: Checkout Bless by Netflix
  • 77. Duo Slack Integration and Dashboard
  • 78. Best Practice: SSM Scripted Remediations Use SSM to execute commands in parallel across machines (don’t use parallel ssh since that is harder to audit) Full audit logs of command and output Use IAM roles to restrict execution Pro tip: use the aws cli to trigger remediations on the command line
  • 79.
  • 80. Best Practice: Federated Accounts Reduce the blast radius when things explode Use one account per environment: dev, staging, production Use a one account for billing aggregation, IAM federation Assumed Roles (e.g. read-only, admin, dba) MFA required to assume roles - to devalue credentials Pro Tip: Use STS API with MFA to generate short lived AWS credentials Example: https://github.com/cloudposse/aws-assumed-role AWS
  • 81. Best Practice: AWS Secrets (Client-side) Client Side (e.g. Terraform, AWS Cli) ● IAM User Account Access Keys (never shared!) ● Access Keys only permit Assume Role+MFA ● Assumed Roles (limit scope) ● Temporary Sessions Tokens with STS (expire after 1 hour) ● MFA (devalue credentials) Solution: https://github.com/cloudposse/aws-assumed-role
  • 82. Best Practice: AWS Secrets (Server-side) Dynamic, Auto Rotating Credentials for Server Applications Never ever hardcode AWS credentials on EC2 instances Server Side (e.g. EC2 Instance, Docker Container) ● IAM Instance Profiles with Assumed Roles ● Use Kube2IAM with Kubernetes (kops) https://github.com/cloudposse/charts/tree/master/incubator/kube2iam-kops ○ Temporary AWS credentials ○ Drop-in Compatiblity with all official AWS client library
  • 83. Best Practice: Bootstrap Secrets Secrets you need to provision new clusters on AWS... ● Run terraform inside of Container ● Private S3 Configuration Bucket ● Encrypted Bucket Objects ● Mount S3 Bucket inside container (S3FS) ● Use /dev/shm for caching Geodesic: https://github.com/cloudposse/geodesic
  • 84. Best Practice: Password Managers Store Organizational Secrets in Password Manager (webhook urls, master account credentials, shared MFA) Use Vaults specific to some shared objective (e.g. team) Require MFA for decryption Avoid Shared Credentials as much as possible (this is a last resort) SSO > Shared Passwords Pro tip: Use 1Password for Teams. Abandon all other password managers.
  • 85. Best Practice: Avoid Password Rules They don't work They frustrate average users Penalize people that use real random password generators They are often computationally weaker → vulnerable to brute force attacks https://blog.codinghorror.com/password-rules-are-bullshit/
  • 86. Best Practice: Avoid Password Rules
  • 89. __EOF__ Erik Osterman, Founder Cloud Posse, LLC hello@cloudposse.com https://cloudposse.com/ https://github.com/cloudposse/