SlideShare una empresa de Scribd logo
1 de 19
Copyright© 2017 GoDaddy Inc. All Rights Reserved.
Don’t Repeat Our Mistakes!
Lessons Learned from Running Go Daddy’s Private Cloud
Kris Lindgren
klindgren@godaddy.com
Mike Dorman
mike.dorman@sendgrid.com
OpenStack Queens Summit, November 2017, Sydney
Copyright© 2016 GoDaddy Inc. All Rights Reserved.
OpenStack at Go Daddy
● 2013: POC cloud (Havana)
● 2014: First production apps (Icehouse)
● 2014: Nova cells v1 (Kilo)
● 2015: “OpenStack everywhere” (Liberty)
● 2017: Working toward containerized services
Copyright© 2016 GoDaddy Inc. All Rights Reserved.
OpenStack at Go Daddy
● What we built:
○ Shared nothing regions
○ Ephemeral disk on local storage
○ Simple networking
○ No live migration
○ Multiple AZ’s
● Scale
○ 1000’s Computes, >100,000 Cores
○ 10,000’s VM’s
Copyright© 2017 GoDaddy Inc. All Rights Reserved.
Avoiding “Accidental Architecture”
Product Infrastructure & Scaling Management
Copyright© 2017 GoDaddy Inc. All Rights Reserved.
Private Cloud =
Free Compute
High Demand =
Overconsumption
Product - Need for Chargeback/Showback
Free Compute =
High Demand
Copyright© 2017 GoDaddy Inc. All Rights Reserved.
Product - Have a Cohesive Vision
• Which OpenStack Services/features
• User onboard/off-boarding
• Patching cadences/methodology
• Legacy integrations
• Adding capacity
• SLAs
• How do end users “consume” OpenStack?
• Procedure for changing the vision
• Helps with cloud paradigm shift
• Expect and tolerate failure
Copyright© 2017 GoDaddy Inc. All Rights Reserved.
Product Issues - How to Avoid
• Manage expectations (for yourself and for users)
• Showback and controls around quota
• Education and evangelism
• Docs and sample code
• “Cloud ready” early adopters
• Ongoing guidance
1.Cloud
2.??????
3.Profit!X
Copyright© 2017 GoDaddy Inc. All Rights Reserved.
Scaling - Nova Cells (v1)
Justification
• Assumed we would grow fast
• Challenges with scaling Nova/RMQ
• Easier earlier than later
• Ongoing debt to manage patches
• Cells v2 was coming soon
http://www.dorm.org/blog/converting-to-openstack-nova-cells-without-destroying-the-world/
Copyright© 2017 GoDaddy Inc. All Rights Reserved.
Scaling - Nova Cells (v1)
Retrospective
Good
• Helped us to scale
• Gained expertise with Nova
• Community street cred
Bad
• No scaling for Neutron
• Patches get more difficult
• Non-standard config
• Delays on v2
• Migration to v2 is unknown
20/20 Hindsight
• Scale/shard RMQ instead
• Aspirations about scale
• Porting patches is top blocker
Copyright© 2017 GoDaddy Inc. All Rights Reserved.
• Colocated API services and RMQ
• (Except Glance)
• Dedicated hardware overkill
• Local python packages
• Made sense for POC
• Nova separated later with Cells v1
Scaling - Collapsed Architecture
Justification
Copyright© 2017 GoDaddy Inc. All Rights Reserved.
Scaling - Collapsed Architecture
Retrospective
Good
• Simple architecture
• Minimal hardware
• Easy network ACLs
• Up and running fast
Bad
• Large failure impacts
• Resource contention
• Single API endpoints
20/20 Hindsight
• OK for POC
• Ignored it too long
• Easy to scale out
• (Implementing now)
Copyright© 2017 GoDaddy Inc. All Rights Reserved.
Infrastructure - Special Neutron Architecture
Justification
• Neutron L2 assumptions
• L3 folded clos network
• L2 stops at leafs
• Uncomfortable with overlays
• Provider network per rack
• Routed floating IPs
• Overload AZ to pick a network
• Local patches for network scheduling
Copyright© 2017 GoDaddy Inc. All Rights Reserved.
Infrastructure - Special Neutron Architecture
Retrospective
Good
• Same for VMs and metal
• Simple infrastructure
• Easy on users
• Network IP usages API
• Segmented networks spec
Bad
• Snowflake setup
• L2 adjacency expectations
• Added features difficult (LBaaS)
• Migration to Neutron segmented networks?
20/20 Hindsight
• Works pretty well
• Patches are limited
• IP usages API extension
• Segmented networks in Neutron
• Many others with same problem
Copyright© 2017 GoDaddy Inc. All Rights Reserved.
Management - Puppet Single Source of Truth
Justification
• Big Puppet shop
• Single source of config
• Good for server bootstrapping
• OpenStack-Puppet modules
• API providers
• Code pipeline already in place
• Ansible kicks off puppet apply
Copyright© 2017 GoDaddy Inc. All Rights Reserved.
Management - Puppet Single Source of Truth
Retrospective
Good
• Single source of config (in theory)
• Efficient bootstrapping
• NOOP mode for sanity
Bad
• State in Puppet, Hiera, APIs
• Some managed manually
• Duplicate API objects
• Omnibus deployments
• NOOP report not always accurate!
• Orphaned/forgotten servers
• Orchestration difficult
20/20 Hindsight
• Many unintended problems
• Not really a single source
• Need for targeted deployments
• Other tools for orchestration
• Use for bootstrapping
Copyright© 2017 GoDaddy Inc. All Rights Reserved.
Strategies for Avoiding Accidental Architecture
• Think of your future selves
•Quantify tech debt interest
• Almost nothing will be temporary
•Make a specific plan and timeline
• Carefully consider scale
•Overestimating can be as bad as
underestimating
• Automate first
•At least make it capable
Copyright© 2017 GoDaddy Inc. All Rights Reserved.
Strategies for Avoiding Accidental Architecture
• KISS!
http://stella.report
Copyright© 2017 GoDaddy Inc. All Rights Reserved.
Strategies for Avoiding Accidental Architecture
• Spread the knowledge wealth
http://stella.report
* The Coming Software Apocalypse: https://www.theatlantic.com/technology/archive/2017/09/saving-the-world-from-code/540393/
“The problem, [...] is that we are attempting to build systems that are
beyond our ability to intellectually manage.” *
Copyright© 2016 GoDaddy Inc. All Rights Reserved.
Recap: How to Live with No Regrets
Questions?
Other Ideas?
klindgren@godaddy.com
mike.dorman@sendgrid.com
● Manage expectations
● Education and evangelism
● Helpful early adopters
● Ongoing guidance
● Remember your future self
● Account and plan for tech debt
● Sane scale expectations
● Automate, automate, automate
● Simplicity
● Knowledge sharing

Más contenido relacionado

La actualidad más candente

Cloudstack: the best kept secret in the cloud
Cloudstack: the best kept secret in the cloudCloudstack: the best kept secret in the cloud
Cloudstack: the best kept secret in the cloudShapeBlue
 
Fast SAP system provisioning based on CloudStack
Fast SAP system provisioning based on CloudStack Fast SAP system provisioning based on CloudStack
Fast SAP system provisioning based on CloudStack ShapeBlue
 
Welcome to CloudLand - DevOps Seattle Feb 2020
Welcome to CloudLand - DevOps Seattle Feb 2020Welcome to CloudLand - DevOps Seattle Feb 2020
Welcome to CloudLand - DevOps Seattle Feb 2020Kaslin Fields
 
From metal to service 100% automation with Apache CloudStack and Ansible - ...
From metal to service 100% automation with Apache CloudStack and Ansible -   ...From metal to service 100% automation with Apache CloudStack and Ansible -   ...
From metal to service 100% automation with Apache CloudStack and Ansible - ...ShapeBlue
 
CloudStack IPv6 in production
CloudStack IPv6 in productionCloudStack IPv6 in production
CloudStack IPv6 in productionShapeBlue
 
Running OpenStack in Production
Running OpenStack in ProductionRunning OpenStack in Production
Running OpenStack in ProductionTesora
 
CloudStack EU User Group - Making stuff better through CloudStack
CloudStack EU User Group - Making stuff better through CloudStackCloudStack EU User Group - Making stuff better through CloudStack
CloudStack EU User Group - Making stuff better through CloudStackShapeBlue
 
OpenStack in Action 4! Thierry Carrez - From Havana to Icehouse
OpenStack in Action 4! Thierry Carrez - From Havana to IcehouseOpenStack in Action 4! Thierry Carrez - From Havana to Icehouse
OpenStack in Action 4! Thierry Carrez - From Havana to IcehouseeNovance
 
OpenStack in the Enterprise
OpenStack in the EnterpriseOpenStack in the Enterprise
OpenStack in the EnterpriseTesora
 
Designing Lean CloudStack Environments for the Edge - IndiQus - CloudStack E...
 Designing Lean CloudStack Environments for the Edge - IndiQus - CloudStack E... Designing Lean CloudStack Environments for the Edge - IndiQus - CloudStack E...
Designing Lean CloudStack Environments for the Edge - IndiQus - CloudStack E...ShapeBlue
 
Securing your Cloud Environment v2
Securing your Cloud Environment v2Securing your Cloud Environment v2
Securing your Cloud Environment v2ShapeBlue
 
Telia latvija cloudstack
Telia latvija cloudstackTelia latvija cloudstack
Telia latvija cloudstackShapeBlue
 
Build and Deploy Cloud Native Camel Quarkus routes with Tekton and Knative
Build and Deploy Cloud Native Camel Quarkus routes with Tekton and KnativeBuild and Deploy Cloud Native Camel Quarkus routes with Tekton and Knative
Build and Deploy Cloud Native Camel Quarkus routes with Tekton and KnativeOmar Al-Safi
 
CNCF Keynote - What is cloud native?
CNCF Keynote - What is cloud native?CNCF Keynote - What is cloud native?
CNCF Keynote - What is cloud native?Weaveworks
 
20140708 - Jeremy Edberg: How Netflix Delivers Software
20140708 - Jeremy Edberg: How Netflix Delivers Software20140708 - Jeremy Edberg: How Netflix Delivers Software
20140708 - Jeremy Edberg: How Netflix Delivers SoftwareDevOps Chicago
 
Leveraging OpenStack to Run Mesos/Marathon at Charter Communications
Leveraging OpenStack to Run Mesos/Marathon at Charter CommunicationsLeveraging OpenStack to Run Mesos/Marathon at Charter Communications
Leveraging OpenStack to Run Mesos/Marathon at Charter CommunicationsTesora
 
Robert Sander: CloudStack and Terraform
Robert Sander: CloudStack and TerraformRobert Sander: CloudStack and Terraform
Robert Sander: CloudStack and TerraformShapeBlue
 
Decomposing Lithium's Monolith with Kubernetes and OpenStack
Decomposing Lithium's Monolith with Kubernetes and OpenStackDecomposing Lithium's Monolith with Kubernetes and OpenStack
Decomposing Lithium's Monolith with Kubernetes and OpenStackMirantis
 
Cache first cloud native microservices
Cache first cloud native microservicesCache first cloud native microservices
Cache first cloud native microservicesMesut Celik
 

La actualidad más candente (20)

Cloudstack: the best kept secret in the cloud
Cloudstack: the best kept secret in the cloudCloudstack: the best kept secret in the cloud
Cloudstack: the best kept secret in the cloud
 
Fast SAP system provisioning based on CloudStack
Fast SAP system provisioning based on CloudStack Fast SAP system provisioning based on CloudStack
Fast SAP system provisioning based on CloudStack
 
Welcome to CloudLand - DevOps Seattle Feb 2020
Welcome to CloudLand - DevOps Seattle Feb 2020Welcome to CloudLand - DevOps Seattle Feb 2020
Welcome to CloudLand - DevOps Seattle Feb 2020
 
From metal to service 100% automation with Apache CloudStack and Ansible - ...
From metal to service 100% automation with Apache CloudStack and Ansible -   ...From metal to service 100% automation with Apache CloudStack and Ansible -   ...
From metal to service 100% automation with Apache CloudStack and Ansible - ...
 
CloudStack IPv6 in production
CloudStack IPv6 in productionCloudStack IPv6 in production
CloudStack IPv6 in production
 
Running OpenStack in Production
Running OpenStack in ProductionRunning OpenStack in Production
Running OpenStack in Production
 
CloudStack EU User Group - Making stuff better through CloudStack
CloudStack EU User Group - Making stuff better through CloudStackCloudStack EU User Group - Making stuff better through CloudStack
CloudStack EU User Group - Making stuff better through CloudStack
 
OpenStack in Action 4! Thierry Carrez - From Havana to Icehouse
OpenStack in Action 4! Thierry Carrez - From Havana to IcehouseOpenStack in Action 4! Thierry Carrez - From Havana to Icehouse
OpenStack in Action 4! Thierry Carrez - From Havana to Icehouse
 
OpenStack in the Enterprise
OpenStack in the EnterpriseOpenStack in the Enterprise
OpenStack in the Enterprise
 
Designing Lean CloudStack Environments for the Edge - IndiQus - CloudStack E...
 Designing Lean CloudStack Environments for the Edge - IndiQus - CloudStack E... Designing Lean CloudStack Environments for the Edge - IndiQus - CloudStack E...
Designing Lean CloudStack Environments for the Edge - IndiQus - CloudStack E...
 
Securing your Cloud Environment v2
Securing your Cloud Environment v2Securing your Cloud Environment v2
Securing your Cloud Environment v2
 
Kubernetes on OpenStack @eBay
Kubernetes on OpenStack @eBayKubernetes on OpenStack @eBay
Kubernetes on OpenStack @eBay
 
Telia latvija cloudstack
Telia latvija cloudstackTelia latvija cloudstack
Telia latvija cloudstack
 
Build and Deploy Cloud Native Camel Quarkus routes with Tekton and Knative
Build and Deploy Cloud Native Camel Quarkus routes with Tekton and KnativeBuild and Deploy Cloud Native Camel Quarkus routes with Tekton and Knative
Build and Deploy Cloud Native Camel Quarkus routes with Tekton and Knative
 
CNCF Keynote - What is cloud native?
CNCF Keynote - What is cloud native?CNCF Keynote - What is cloud native?
CNCF Keynote - What is cloud native?
 
20140708 - Jeremy Edberg: How Netflix Delivers Software
20140708 - Jeremy Edberg: How Netflix Delivers Software20140708 - Jeremy Edberg: How Netflix Delivers Software
20140708 - Jeremy Edberg: How Netflix Delivers Software
 
Leveraging OpenStack to Run Mesos/Marathon at Charter Communications
Leveraging OpenStack to Run Mesos/Marathon at Charter CommunicationsLeveraging OpenStack to Run Mesos/Marathon at Charter Communications
Leveraging OpenStack to Run Mesos/Marathon at Charter Communications
 
Robert Sander: CloudStack and Terraform
Robert Sander: CloudStack and TerraformRobert Sander: CloudStack and Terraform
Robert Sander: CloudStack and Terraform
 
Decomposing Lithium's Monolith with Kubernetes and OpenStack
Decomposing Lithium's Monolith with Kubernetes and OpenStackDecomposing Lithium's Monolith with Kubernetes and OpenStack
Decomposing Lithium's Monolith with Kubernetes and OpenStack
 
Cache first cloud native microservices
Cache first cloud native microservicesCache first cloud native microservices
Cache first cloud native microservices
 

Similar a Don't Repeat Our Mistakes! Lessons Learned from Running Go Daddy's Private Cloud (OpenStack Queens Summit)

Get the Exact Identity Solution You Need - In the Cloud - Overview
Get the Exact Identity Solution You Need - In the Cloud - OverviewGet the Exact Identity Solution You Need - In the Cloud - Overview
Get the Exact Identity Solution You Need - In the Cloud - OverviewForgeRock
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoopinside-BigData.com
 
Brownbag on basics of node.js
Brownbag on basics of node.jsBrownbag on basics of node.js
Brownbag on basics of node.jsJason Park
 
MySQL InnoDB Cluster - A complete High Availability solution for MySQL
MySQL InnoDB Cluster - A complete High Availability solution for MySQLMySQL InnoDB Cluster - A complete High Availability solution for MySQL
MySQL InnoDB Cluster - A complete High Availability solution for MySQLOlivier DASINI
 
Stash – Taking Expedia to New Heights - David Williams and Christopher Pepe
Stash – Taking Expedia to New Heights - David Williams and Christopher PepeStash – Taking Expedia to New Heights - David Williams and Christopher Pepe
Stash – Taking Expedia to New Heights - David Williams and Christopher PepeAtlassian
 
Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...
Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...
Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...Openbar
 
Running Oracle EBS in the cloud (UKOUG APPS16 edition)
Running Oracle EBS in the cloud (UKOUG APPS16 edition)Running Oracle EBS in the cloud (UKOUG APPS16 edition)
Running Oracle EBS in the cloud (UKOUG APPS16 edition)Andrejs Prokopjevs
 
321 codeincontainer brewbox
321 codeincontainer brewbox321 codeincontainer brewbox
321 codeincontainer brewboxLino Telera
 
The Perils and Triumphs of using Cassandra at a .NET/Microsoft Shop
The Perils and Triumphs of using Cassandra at a .NET/Microsoft ShopThe Perils and Triumphs of using Cassandra at a .NET/Microsoft Shop
The Perils and Triumphs of using Cassandra at a .NET/Microsoft ShopJeff Smoley
 
Cloud Camp Chicago Dec 2012 Slides
Cloud Camp Chicago Dec 2012 SlidesCloud Camp Chicago Dec 2012 Slides
Cloud Camp Chicago Dec 2012 SlidesRyan Koop
 
Cloud Camp Chicago Dec 2012 - All presentations
Cloud Camp Chicago Dec 2012 - All presentationsCloud Camp Chicago Dec 2012 - All presentations
Cloud Camp Chicago Dec 2012 - All presentationsCloudCamp Chicago
 
Greenfields tech decisions
Greenfields tech decisionsGreenfields tech decisions
Greenfields tech decisionsTrent Hornibrook
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyCeph Community
 
Deploying your SaaS stack OnPrem
Deploying your SaaS stack OnPremDeploying your SaaS stack OnPrem
Deploying your SaaS stack OnPremKris Buytaert
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on CephCeph Community
 
Ceph: A decade in the making and still going strong
Ceph: A decade in the making and still going strongCeph: A decade in the making and still going strong
Ceph: A decade in the making and still going strongPatrick McGarry
 
Using Databases and Containers From Development to Deployment
Using Databases and Containers  From Development to DeploymentUsing Databases and Containers  From Development to Deployment
Using Databases and Containers From Development to DeploymentAerospike, Inc.
 
Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStack
Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStackAdobe Advertising Cloud: The Reality of Cloud Bursting with OpenStack
Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStackNicolas Brousse
 

Similar a Don't Repeat Our Mistakes! Lessons Learned from Running Go Daddy's Private Cloud (OpenStack Queens Summit) (20)

Get the Exact Identity Solution You Need - In the Cloud - Overview
Get the Exact Identity Solution You Need - In the Cloud - OverviewGet the Exact Identity Solution You Need - In the Cloud - Overview
Get the Exact Identity Solution You Need - In the Cloud - Overview
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoop
 
Brownbag on basics of node.js
Brownbag on basics of node.jsBrownbag on basics of node.js
Brownbag on basics of node.js
 
MySQL InnoDB Cluster - A complete High Availability solution for MySQL
MySQL InnoDB Cluster - A complete High Availability solution for MySQLMySQL InnoDB Cluster - A complete High Availability solution for MySQL
MySQL InnoDB Cluster - A complete High Availability solution for MySQL
 
Stash – Taking Expedia to New Heights - David Williams and Christopher Pepe
Stash – Taking Expedia to New Heights - David Williams and Christopher PepeStash – Taking Expedia to New Heights - David Williams and Christopher Pepe
Stash – Taking Expedia to New Heights - David Williams and Christopher Pepe
 
Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...
Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...
Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...
 
Running Oracle EBS in the cloud (UKOUG APPS16 edition)
Running Oracle EBS in the cloud (UKOUG APPS16 edition)Running Oracle EBS in the cloud (UKOUG APPS16 edition)
Running Oracle EBS in the cloud (UKOUG APPS16 edition)
 
321 codeincontainer brewbox
321 codeincontainer brewbox321 codeincontainer brewbox
321 codeincontainer brewbox
 
The Perils and Triumphs of using Cassandra at a .NET/Microsoft Shop
The Perils and Triumphs of using Cassandra at a .NET/Microsoft ShopThe Perils and Triumphs of using Cassandra at a .NET/Microsoft Shop
The Perils and Triumphs of using Cassandra at a .NET/Microsoft Shop
 
Cloud Camp Chicago Dec 2012 Slides
Cloud Camp Chicago Dec 2012 SlidesCloud Camp Chicago Dec 2012 Slides
Cloud Camp Chicago Dec 2012 Slides
 
Cloud Camp Chicago Dec 2012 - All presentations
Cloud Camp Chicago Dec 2012 - All presentationsCloud Camp Chicago Dec 2012 - All presentations
Cloud Camp Chicago Dec 2012 - All presentations
 
Greenfields tech decisions
Greenfields tech decisionsGreenfields tech decisions
Greenfields tech decisions
 
Application Virtualization Smackdown
Application Virtualization SmackdownApplication Virtualization Smackdown
Application Virtualization Smackdown
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case Study
 
Deploying your SaaS stack OnPrem
Deploying your SaaS stack OnPremDeploying your SaaS stack OnPrem
Deploying your SaaS stack OnPrem
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph
 
Ceph: A decade in the making and still going strong
Ceph: A decade in the making and still going strongCeph: A decade in the making and still going strong
Ceph: A decade in the making and still going strong
 
Using Databases and Containers From Development to Deployment
Using Databases and Containers  From Development to DeploymentUsing Databases and Containers  From Development to Deployment
Using Databases and Containers From Development to Deployment
 
Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStack
Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStackAdobe Advertising Cloud: The Reality of Cloud Bursting with OpenStack
Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStack
 
SD Times - Docker v2
SD Times - Docker v2SD Times - Docker v2
SD Times - Docker v2
 

Último

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 

Último (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Don't Repeat Our Mistakes! Lessons Learned from Running Go Daddy's Private Cloud (OpenStack Queens Summit)

  • 1. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Don’t Repeat Our Mistakes! Lessons Learned from Running Go Daddy’s Private Cloud Kris Lindgren klindgren@godaddy.com Mike Dorman mike.dorman@sendgrid.com OpenStack Queens Summit, November 2017, Sydney
  • 2. Copyright© 2016 GoDaddy Inc. All Rights Reserved. OpenStack at Go Daddy ● 2013: POC cloud (Havana) ● 2014: First production apps (Icehouse) ● 2014: Nova cells v1 (Kilo) ● 2015: “OpenStack everywhere” (Liberty) ● 2017: Working toward containerized services
  • 3. Copyright© 2016 GoDaddy Inc. All Rights Reserved. OpenStack at Go Daddy ● What we built: ○ Shared nothing regions ○ Ephemeral disk on local storage ○ Simple networking ○ No live migration ○ Multiple AZ’s ● Scale ○ 1000’s Computes, >100,000 Cores ○ 10,000’s VM’s
  • 4. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Avoiding “Accidental Architecture” Product Infrastructure & Scaling Management
  • 5. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Private Cloud = Free Compute High Demand = Overconsumption Product - Need for Chargeback/Showback Free Compute = High Demand
  • 6. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Product - Have a Cohesive Vision • Which OpenStack Services/features • User onboard/off-boarding • Patching cadences/methodology • Legacy integrations • Adding capacity • SLAs • How do end users “consume” OpenStack? • Procedure for changing the vision • Helps with cloud paradigm shift • Expect and tolerate failure
  • 7. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Product Issues - How to Avoid • Manage expectations (for yourself and for users) • Showback and controls around quota • Education and evangelism • Docs and sample code • “Cloud ready” early adopters • Ongoing guidance 1.Cloud 2.?????? 3.Profit!X
  • 8. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Scaling - Nova Cells (v1) Justification • Assumed we would grow fast • Challenges with scaling Nova/RMQ • Easier earlier than later • Ongoing debt to manage patches • Cells v2 was coming soon http://www.dorm.org/blog/converting-to-openstack-nova-cells-without-destroying-the-world/
  • 9. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Scaling - Nova Cells (v1) Retrospective Good • Helped us to scale • Gained expertise with Nova • Community street cred Bad • No scaling for Neutron • Patches get more difficult • Non-standard config • Delays on v2 • Migration to v2 is unknown 20/20 Hindsight • Scale/shard RMQ instead • Aspirations about scale • Porting patches is top blocker
  • 10. Copyright© 2017 GoDaddy Inc. All Rights Reserved. • Colocated API services and RMQ • (Except Glance) • Dedicated hardware overkill • Local python packages • Made sense for POC • Nova separated later with Cells v1 Scaling - Collapsed Architecture Justification
  • 11. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Scaling - Collapsed Architecture Retrospective Good • Simple architecture • Minimal hardware • Easy network ACLs • Up and running fast Bad • Large failure impacts • Resource contention • Single API endpoints 20/20 Hindsight • OK for POC • Ignored it too long • Easy to scale out • (Implementing now)
  • 12. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Infrastructure - Special Neutron Architecture Justification • Neutron L2 assumptions • L3 folded clos network • L2 stops at leafs • Uncomfortable with overlays • Provider network per rack • Routed floating IPs • Overload AZ to pick a network • Local patches for network scheduling
  • 13. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Infrastructure - Special Neutron Architecture Retrospective Good • Same for VMs and metal • Simple infrastructure • Easy on users • Network IP usages API • Segmented networks spec Bad • Snowflake setup • L2 adjacency expectations • Added features difficult (LBaaS) • Migration to Neutron segmented networks? 20/20 Hindsight • Works pretty well • Patches are limited • IP usages API extension • Segmented networks in Neutron • Many others with same problem
  • 14. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Management - Puppet Single Source of Truth Justification • Big Puppet shop • Single source of config • Good for server bootstrapping • OpenStack-Puppet modules • API providers • Code pipeline already in place • Ansible kicks off puppet apply
  • 15. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Management - Puppet Single Source of Truth Retrospective Good • Single source of config (in theory) • Efficient bootstrapping • NOOP mode for sanity Bad • State in Puppet, Hiera, APIs • Some managed manually • Duplicate API objects • Omnibus deployments • NOOP report not always accurate! • Orphaned/forgotten servers • Orchestration difficult 20/20 Hindsight • Many unintended problems • Not really a single source • Need for targeted deployments • Other tools for orchestration • Use for bootstrapping
  • 16. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Strategies for Avoiding Accidental Architecture • Think of your future selves •Quantify tech debt interest • Almost nothing will be temporary •Make a specific plan and timeline • Carefully consider scale •Overestimating can be as bad as underestimating • Automate first •At least make it capable
  • 17. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Strategies for Avoiding Accidental Architecture • KISS! http://stella.report
  • 18. Copyright© 2017 GoDaddy Inc. All Rights Reserved. Strategies for Avoiding Accidental Architecture • Spread the knowledge wealth http://stella.report * The Coming Software Apocalypse: https://www.theatlantic.com/technology/archive/2017/09/saving-the-world-from-code/540393/ “The problem, [...] is that we are attempting to build systems that are beyond our ability to intellectually manage.” *
  • 19. Copyright© 2016 GoDaddy Inc. All Rights Reserved. Recap: How to Live with No Regrets Questions? Other Ideas? klindgren@godaddy.com mike.dorman@sendgrid.com ● Manage expectations ● Education and evangelism ● Helpful early adopters ● Ongoing guidance ● Remember your future self ● Account and plan for tech debt ● Sane scale expectations ● Automate, automate, automate ● Simplicity ● Knowledge sharing

Notas del editor

  1. KRIS
  2. KRIS Late 2013, Havana, POC/”dev pilot” cloud Morphed into production cloud by 2014 Liberty since 2015, blocked on containerization
  3. KRIS So what did we end up building? Totally separate clouds at multiple location (no shared keystone) VM’s boot to local storage, may have cinder volume to store “persistent data” Network centric approach to networking. Letting the networking gear take care of the packets. No live migration - meaning we didn’t want to have pets. Teams were advised to be able rebuild servers. Anything of state should not go on openstack (Databases)
  4. MIKE Decisions have long lasting impacts Tough to change later Talk about general categories of things Infrastructure & Scaling Management (Config Management) Product We’re going to work backwards, and Kris is going to kick us off with some issues around product
  5. MIKE Free for everybody, then it gets all used up and isn’t there when needed Tragedy of the commons (find some good images for this) Had a lot of trouble keeping up with capacity consumption, so we would run out of space We had ridiculously high quotas, intending on reporting back to corporate finance (but that never happened)
  6. KRIS I know there is a lot of text on this slide, but I am not going to go through everything here DOCUMENT WHAT YOUR ARE PROVIDING SLA’s Patching policy How do you want end users to use what you built Example deployment’s/architectures Integrations with legacy applications If you are running a cloud and you don’t have this documented. Please, Please, Please do the work to get this documented and agreed upon. product vision drives your technical requirements Small changes to the vision/requirements can have a fundamental shift in what you need to provide. After getting this documented, also get the process for how to change the vision documented.
  7. MIKE Be clear on what you’re providing and what you’re not (before you build it) Know where you are going! Even if you don’t plan to actually charge others real money for using your cloud, you need to show them what they’re using and translate that to value somehow Definitely enforce some quota control Talk about how we opened up our quotas with intentions to report back to finance department against budgets (which never happened) Unless you can actually scale hardware super fast (you can’t) then it can’t just be a free-for-all Education and evangelism Good docs, getting started guides, sample code Give them something to copy and paste Start with teams that are already “cloud ready” as early adopters Provide ongoing architectural advice and constructive advice Don’t be arrogant or treat people who aren’t to your level yet poorly This should go without saying, but if we’re honest, we all have condescending attitudes toward some Help when things go wrong Describe SendGrid ProdOps team Now, moving on to the more technical architecture decisions
  8. MIKE We knew we would grow fast (see earlier graph) Known challenges with scaling Nova/RMQ Easier to move to cells v1 early, rather than a fire drill scaling exercise later Knew we would take on some ongoing debt to forward port v1 patches for each new version Cells v2 was coming “real soon now” Details about how we did it in Link to my YVR talk about moving to cells
  9. MIKE Good Helped us to scale and segment our infrastructure (failure boundaries) Gained a lot of expertise with Nova Street cred in community (LDT group, etc.) Bad Neutron doesn’t scale the same way, which ended up being our main bottleneck (not Nova) Forward porting patches becomes more and more difficult over time (eternal thanks to Sam from NeCTAR) Unknown how/if we can online migrate to cells v2 Cells v2 still coming “real soon now” (mostly there now)
  10. KRIS Run all API/server services, plus RMQ all on one set of servers Glance separate to stay network adjacent to computes Most nova services moved later as part of cells v1 Symptom of starting small with POC environment and then growing larger
  11. KRIS Good Less hardware to deal with Simpler architecture Easier network/firewall ACLs It helped us get started quickly Bad Any problems are very impactful, it takes out a wide swath of services Resource contention (RMQ and Neutron fighting over RAM and oom killing each other) No admin vs. public endpoints, more difficult to do maintenance that doesn’t expose errors to users
  12. KRIS Neutron assumes (or used to assume) L2 everywhere and its available anywhere In our datacenter network L2 stops at TOR So getting to a server in another rack goes though the gateway of the local switch to the spine and in to the other rack Persistent IP’s can be routed to any vm within the network Overlays viewed as unnecessarily complex, difficult to troubleshoot Provider network per rack (L2 domain), we pick a network for you based on AZ selection Local patches to do network scheduling
  13. KRIS Good Able to provide the same networking paradigm to VMs as to metal Simple infrastructure, VMs just get an IP and they’re good to go Network IP usages API implemented and committed upstream Kicked off segmented networks spec as collaboration between LDT and Neutron This remains the thing I’m most proud of accomplishing/helping with in OpenStack Bad Our Neutron doesn’t work like everybody else’s People love their L2 adjacency Unable to support more complex networking features out of the box (e.g. LBaaS) Unsure how we will go about migrating to real Neutron segmented networks
  14. MIKE Big puppet shop Pretty good for server bootstrapping and config management Wanted one stop shop for all config OpenStack API providers for managing users, groups, roles, AZs, networks, etc. Code review/pipeline already in place Config mostly in Puppet and Hiera repos State of OS resources inside APIs Physical hosts in manually curated Ansible hosts file
  15. MIKE Good Single place for all config (in theory) Helpful for new server bootstrapping and initial config Noop mode helpful to see what will happen Bad Config and current state was actually split across Puppet and Hiera repos, as well as the service APIs Difficulties with API providers led to duplicate objects (networks, AZs) Difficult to do non-omnibus targeted deployments (Puppet upgraded RabbitMQ, woops!) Roles and grants still managed manually ad-hocly Noop report not always accurate! Sometimes servers are forgotten about because we forget to put them in the list Difficult to do more intelligent orchestration of things when the data is all over the place
  16. MIKE Think of your future self Almost nothing will be temporary Unless have you have a specific plan and timeline for moving away from it, and you can trust yourself to follow through Try to quantify the interest you will pay on the tech debt Consider your expected scale (more than seat-of-your-pants) Just as bad to overestimate and overbuild than to underestimate Automate first (or at least make sure the capability is there)
  17. MIKE Keep it simple The perfect design is not when nothing else can be added, but when nothing else can be removed. As we were working on this, the Stella Report came out which articulates pretty well a lot of the ideas we were thinking about. Particularly around the idea of complexity, and a term they coined “dark debt” Dark debt/unknown unknowns that come from complexity (link to Velocity talk/Stella paper) Best thing you can do to minimize is to keep things as simple as possible
  18. MIKE Do as much as you can to simplify, but it’s still complex. Spread the knowledge wealth Try to keep everybody up to speed with what’s going on Keep the “mental models” of the system accurate and up to date (above the line/below the line) Avoid individual/tribal knowledge
  19. KRIS