Do you want a way to deploy CloudStack management services, including databases and supporting services, into a new environment with ease? Do you need resilience for your environment's management plane?
We've created a appliance that can host all of the components required to manage a CloudStack-based cloud infrastructure, and can be deployed on various types of hardware, with minimal requirements. The project led to the use of a few interesting technologies and methods, including a tested and customized implementation of MariaDB/Galera to backend CloudStack. During this session, we will go over this appliance design, and hopefully have a dialogue about similar deployment designs that others have used.
We've created a appliance that can host all of the components required to manage a CloudStack-based cloud infrastructure, and can be deployed on various types of hardware, with minimal requirements. The project led to the use of a few interesting technologies and methods, including a tested and customized implementation of MariaDB/Galera to backend CloudStack. During this session, we will go over this appliance design, and hopefully have a dialogue about similar deployment designs that others have used.
Sungard is a global business with fingers in Various areas. I work for availabilty services which provides DR, Managed Services, and IT consulting. This is a lot of stuff that you’re probably not super interested in.
Around 2009, Product Development was tasked with creating a cloud solution, and for all intents and purposes became Cloud Engineering. We developed an Enterprise grade clouds, for customers to move their workloads into a virtualized solution without changing their processes much. It’s totally managed, which means our operation team will maintain your OS, some of your Apps etc.
Over time we’ve developed a fully automated service provisioning system that creates every aspect of a customer’s Virtual Datacenter, from Routing, to Firewall, to switching, all the way to VMs and services for them, such as business continuity.
However, Over time this software has also gotten a bit unwieldy, difficult to troubleshoot, lifecycle, and maintain for Engineering and operations. So starting around 2011, the decision was made to start looking at cloudstack, at least for a ‘Public, Dev/QA’ offerings WE launched our first ‘Cloudstack powered’ offering in our Dublin center in November of last year. Currently I’m working on a Public Cloud offering which will launch in the early part of 2014, and will launch soon after in 4 other locations around the world. – Cloudstack for automation, and a custom created Portal for the User experience. The hope is that Cloudstack will allow us meet our customers changing demands in a more agile fashion.
This is what our orchestration systems look look like currently. They are super expensive, and pretty complex to lifecycle, troubleshoot. And cable. And purchase. And deal with in general.
Cloudstack and our current design has allowed us to simplify this. We didn’t want to deal with a SAN or NAS (in case this gets deployed as an on premises solution) so any kind of shared storage was right out – the only shared resource between these servers is network, and that will be shared with some NAS traffic, as well as Outbound bandwidth. We wanted to have some headroom to add services (rabbitmq for notifications is on our todo list, etc), so our current environment has rather beefy servers (2x8 core Procs, way too much ram)These servers are using Xen as a hypervisor, but since we’re not using any hypervisor specific features, we don’t particularly care what we use
That’s guided what we’ve done with this current setup, especially with regards to our orchestration designKeep the orchestration with as few moving parts as possible, allowing for ease of setup, maintenance and troubleshooting. OUR customer won’t tolerate downtime, so platform is designed to be resillient and secure. Our Portal will consume the cloudstack API, as will (some) customers.
So let’s talk about what’s running on those hypervisors that we don’t care about.Cloudstack management Servers,MariaDB with Galera for DB HAA virtual Firewall
Both mariadb servers are active and therefore we have a multi master environment , and sowe have each CS platform speaking to only one DB
We’re not very concerned about split brain in our design, because the hypervisors are connected via an LACP interface, on which all necessary tagged networks ride. In the unlikely event that we lose the pair of interfaces on the same host, no traffic will flow to or from that Cloudstack instance, effectively fencing it off from the world. \In the case of a prolonged outage, upon recovery, operations may have to make sure that the dbs are in sync before bringing the second system back online
Animate me
Animate me
Here are the features we've currently worked outAs we've discussed, the firewall and network layout has been specifically tailored to this purposeMariaDB setup - For those not familiar with we've got mariadb auto starting, but checking for a peer to set proper mastership. If you're unfamiliar with MariaDB and Galera, to sum up, you must start up a server in standalone mode, and then start the secondary with knowledge of its masterIf you don't wait, they will either each start up as standalone, or not start because they can't find their peer.We have created a script to mitigate this, and to allow you to not worry about startup order across machinesWe also have some scripts written to check to see if the DB is up before starting up Cloudstack.These modifications allow us to boot up our orchestration hpervisors, and not worry about boot order for vms. They will all naturally shake into place as they boot.
Here are some future stepsWe are finishing the puppetizaton the entire set of orchestration VMs, so that it's easier to install and maintain We are creating a questionnaire script to feed into puppet - in this way we can create a base install, answer some questions, and have an up and running base cloudstack build, ready to manage an environment. This will allow us to spin up more sites more quickly. In the future, the firewall might participate in the routing core, so that we can remotely manage entire sites through a vpn tunnel to the remote set of cloudstack servers
So it's a work in progress, we'll continue to refine things as we go along. I'd love to hear any comments, questions, or suggestions, especially if they're written on a 20 dollar bill.