When we built our private cloud with OpenStack, we never thought it would be this complex or time-consuming. In this talk I want to share our approach, the challenges we faced, and why we learned to appreciate good monitoring and log aggregation.
2. 2
Who am I?
●
Kevin Honka
●
Senior System Engineer at AD IT Systems
●
Twitter: @piratehonk
●
Mastodon: @piratehonk@norden.social
●
Mail: kevin (at) honka.dev
3. 3
Roadmap
●
What is a private cloud
●
How does one build it
●
Pros / Cons
●
Monitoring
●
Difficulties
4. 4
What is a private cloud?
●
Similar to
– Google cloud
– AWS
– Azure
●
But on our own hardware
5. 5
What is a private cloud?
●
KVM on steroids
●
Loads of services
●
Lots of Infrastructure automation
●
Kept together by tears and duct tape
11. 11
Building a cloud
●
3 high performance servers for Openstack
– Dedicated Fiber Channel
– Dual sockets with high core CPUs
– All RAM slots occupied for optimal usage
●
4 high I/O servers for Ceph
– Dedicated Fiber Channel
– Single socket with medium cpu
– Nvme SSDs for Storage
12. 12
Building a cloud
●
4 Node Ceph cluster with default settings
– Setup using cephadm
●
3 Node Openstack cluster
– 1 Control/Network Node
– 2 Compute Nodes
●
setup with kolla-ansible
– A single run takes around 30 – 60 Minutes
30. 30
Slow Interface / API
●
Why is this happening?
– Too many connections via HAProxy
●
A single request can generate up to 500-2000 internal requests
●
How do we resolve this?
– Use HAProxy only for incomming requests
– Remove HAProxy completely
31. 31
Slow Interface / API
●
Use HAProxy only for incomming requests
– Minimal impact
– Easy to configure
●
Remove HAProxy completely
– Loss of high availability
– One less service to worry about
32. 32
Slow Interface / API
●
Monitoring takeaways
– Check logs for dropped connections
– Monitor open tcp connections and times of the linux kernel
41. 41
Numaswapping
●
KVM Processes jump between Cores
●
On Socket change, Memory is behind a different CPU
– Increased memory access time
– Slower PCIe access
42. 42
Performance inconsistencies
●
Activate CPU pinning
– CPU cores will be exclusive to a single KVM Thread
– Less available resources on compute nodes
– Need more compute nodes for same amount of VMs
●
Run KVM NUMA aware
– KVM Threads will always run on the same NUMA Node
– No exclusive cores
44. 44
Bad I/O
●
Ceph RDB volumes for VMs
●
Causes?
– Network?
– Wrong configuration?
– Hardware limits?
45. 45
Bad I/O
●
Symptoms
– Slow writes; less than 300 op/s
– Inconsistent reads; fluctuating between 20k and 20 op/s
– Slow commits; more than 50 msec
46. 46
Bad I/O
●
Searching for a solution
– Many tipps for optimizations
●
Stabilized I/O but did not increase it to estimated levels
– Estimation
●
NVMe SSDs
●
Atleast 100k op/s
●
Fast commit to disk; less than 500 usec
47. 47
Bad I/O
●
Searching for a solution
– Network works at peak, with 20GBps
– Hardware resources are hardly touched
– Possible Problem with Ceph?
●
Nothing in the documentation
●
No recommendations
– Accept it as fate and move to local storage?
48. 48
Bad I/O
●
A random link to a ceph mailing list
– OSDs should be at a max of 1TB
else performance will be poor
49. 49
Bad I/O
●
Reconfiguring the ceph cluster to OSDs with a max size of
1TB
– OSDs increase from 20 to 60
– Each OSD gets it’s own core
●
No NUMA swapping
– Each SSD contains 3 OSDs
54. 54
What happened since then?
●
Implementation of Prometheus for all Services and
Servers
●
Grafana Dashboards for everything important
●
Custom alert rules based on aggregated metrics