5. Tools
There is no good way to get the following info:
I need a list of instances on a host and their IPs
I need to gracefully start/stop all instances on a host
Some tools needs hostname, some need id (decimal or
hex), some need uuid
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
6. Tools
There is no good way to get the following info:
I need a list of instances on a host and their IPs
I need to gracefully start/stop all instances on a host
Some tools needs hostname, some need id (decimal or
hex), some need uuid
SELECT
instances.id,instances.hostname,instances.project_id,fixed_ips.address
as fixed_address,floating_ips.address as floating_address FROM instances
LEFT JOIN fixed_ips ON instances.id=fixed_ips.instance_id LEFT JOIN
floating_ips ON floating_ips.fixed_ip_id=fixed_ips.id WHERE
instances.deleted="NULL" AND instances.host="<hostname of physical
machine>" ORDER BY instances.id;
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
8. Tools
WE NEED BETTER OPS TOOLS!
Pulsar
https://github.com/
rsoprivatecloud/pulsar
“nova swiss army knife”
requires direct nova
database access
RACKSPACE® HOSTING | WWW.RACKSPACE.COM 4
9. Tools
WE NEED BETTER OPS TOOLS!
Pulsar
https://github.com/
rsoprivatecloud/pulsar
“nova swiss army knife”
requires direct nova
database access
RACKSPACE® HOSTING | WWW.RACKSPACE.COM 4
10. Tools
Holland (opensource database backup framework)
Written by Rackspace DBAs
http://wiki.hollandbackup.org/
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
13. Tools
dsh
dsh -Mcg compute uname-a
bashfoo
for i in `knife node list | grep cpu`; do knife node run_list
add $i "role[single-compute]"; done
for k in `seq 1 20`; do for i in {compute,network}; do nova-
manage service disable computevm0$k nova-$i; done; done
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
14. Performance and Scale Considerations
Disk IO
For high performance
use remote block storage
For “local” disk IO, raw
image type is only
slightly faster than
qcow2
IO will degrade while
Glance copies images
between machines
scheduler=cfq, KVM
cache=none
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
15. Performance and Scale Considerations
Disk IO Async&Random&IO&
rs/speed/test12"(cfq,"host"deadline,"cache=none)"
For high performance
Rs/speed/test13"(noop,"cache=writeback)"
use remote block storage rs/speed/test13"(cfq,"cache=writeback)"
Rs/speed/test12"(noop"cache=none)"
randW"(direct)"
Rs/speed/test12"(cfq"cache=none)"
randR"(direct)"
For “local” disk IO, raw Rs/speed/test13"(cfq,"cache=none,"no"ht)"
randW"
image type is only
randR"
Rs/speed/test13"(deadline"cache=none)"
slightly faster than compute/host"(deadline)"
qcow2 compute/host"(no"ht)"
compute/host"
0" 200" 400" 600" 800" 1000" 1200" 1400" 1600"
IO will degrade while Host&vs.&Instance&
Glance copies images 14000"
between machines 12000"
10000"
scheduler=cfq, KVM
8000"
compute/host"
cache=none 6000" Rs/speed/test12"(cfq"cache=none)"
4000"
2000"
0"
randR" randW" randR" randW" seqR" seqW"RACKSPACE® HOSTING
seqR" seqw" | WWW.RACKSPACE.COM
(direct)" (direct)" (direct)" (direct)"
20. Performance and Scale Considerations
Swift disk usage with different chunk sizes
5 zones - 4 x 1TB disks per zone
20TB raw - 6.67TB usable RACKSPACE® HOSTING | WWW.RACKSPACE.COM
21. Performance and Scale Considerations
Swift disk usage with different chunk sizes
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
22. Performance and Scale Considerations
Glance chunk size
Too high and swift can become unbalanced
What are the downsides to being too low?
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
23. Performance and Scale Considerations
Glance
Disk Tuning (swift)
read ahead on your block device(s) - no noticeable
gain
deadline scheduler - no noticeable gain
Best thing for glance performance - Caching
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
24. Performance and Scale Considerations
Glance
Disk Tuning (swift)
read ahead on your block device(s) - no noticeable
gain
deadline scheduler - no noticeable gain
Best thing for glance performance - Caching
Image Size Not Cached Cached
1.4GB 20secs 1sec
16.4GB 2min 21secs 1sec
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
25. Performance and Scale Considerations
Glance
Disk Tuning (swift)
read ahead on your block device(s) - no noticeable
gain
deadline scheduler - no noticeable gain
Best thing for glance performance - Caching
Image Size Not Cached Cached *times from
“creating image” to
1.4GB 20secs 1sec “qemu-img create”
16.4GB 2min 21secs 1sec
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
26. Performance and Scale Considerations
Scheduler
What we use by default:
scheduler tasks are not processed in parallel
Adding additional schedulers helps provide HA but they don’t speed up
overall times to complete requests
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
27. Automated Config Management
Chef: http://github.com/rcbops/chef-
cookbooks
time to stand up
controller - less than 20 minutes
compute node - less than 2 min
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
28. Day to Day tasks
Dealing with new issues
resize - all nova-compute processes need to be able to log
into all other compute nodes via ssh keys
Hardware failures
We’re still managing infrastructure, failures happen
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
29. Lessons Learned
We need better Operations tools!
Network Design can be confusing for people used to “the old way”
OpenStack is still relatively new, help your organization understand it.
It’s easy to forget we’re working with Linux machines
It’s not you, it’s a bug :)
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
30. But....
But this is a design summit also
Open to discussions/thoughts/questions
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Goal: The goal of this talk to give the operations community an idea of how we are running our\nprivate clouds. The private cloud team(s) at Rackspace have approached our design with the \nfollowing philosophies:\n\nStable code only - We are only deploying code that we feel is stable enough for primetime.\nEasy to deploy with repeatable processes ( supportable)\nChef, standardized network design\nAll OpenSource (Linux, KVM)\n
In April, we were performing simple CDM agent based monitoring\n\nNow, integrated solution using opensource tools\ncollectd, statsd, monit, graphite\nwon&#x2019;t go into much detail here but check our chef recipes (later) see how it&#x2019;s designed\n
In April, we were performing simple CDM agent based monitoring\n\nNow, integrated solution using opensource tools\ncollectd, statsd, monit, graphite\nwon&#x2019;t go into much detail here but check our chef recipes (later) see how it&#x2019;s designed\n
it&#x2019;s getting better but still rough\n
Can do things like \ngive all vms on a host\nget usage report for an availability zone\n\nWe want this to be THE ops tool\nI want this tool to get instance audit and remediation tools\n
Can do things like \ngive all vms on a host\nget usage report for an availability zone\n\nWe want this to be THE ops tool\nI want this tool to get instance audit and remediation tools\n
should be safe to run on master (Innodb - row level locking) but always best to back up off of a slave\n\nNot yet a chef cookbook\n
\n
\n
qcow2 may make better &#x201C;business&#x201D; sense.\nfaster to spin up, cheaper to store\nMoral: benchmark for your workloads, plan accordingly.\nMitigate with Cinder\nMajor benefit of private clouds is being able to customize these pieces. \n
chunking is sets point where glance will break apart a large file and what size to set the various chunks too\n
chunking is sets point where glance will break apart a large file and what size to set the various chunks too\n
chunking is sets point where glance will break apart a large file and what size to set the various chunks too\n
describe environment\n
describe environment\n
Is there a formula for ideal chunk size based on overall swift cluster size?\n\nOffer discussion about ideal chunk size at end\n
Downside to not removing images\ndisk space on compute nodes\n\nBest of both worlds:\n&#x201C;smart&#x201D; cache system. pre-caching often used images\n
Downside to not removing images\ndisk space on compute nodes\n\nBest of both worlds:\n&#x201C;smart&#x201D; cache system. pre-caching often used images\n
offer discussion at end about scaling the scheduler\n
repeatable process - requirement for supporting multiple environments\n\ngreat for creating test environments\n
\n
Ops tools - perhaps an additional section to the dashboard\n\nmultiple layer 3 networks on one layer 2 vlan\nnetwork devices may be a limiting factor, firewalls, load balancers, etc\n
discuss glance caching\npros/cons or raw vs qcow2\nscaling the scheduler\n