Deploying MariaDB databases with containers at Nokia Networks

© 2018 Nokia1
Deploying MariaDB databases with
containers at Nokia
Deploying MariaDB solutions in containerized environments in Nokia Networks
Rick Lane
27-02-2019

© 2018 Nokia2
Deploying MariaDB databases with
containers at Nokia
Deploying MariaDB solutions in containerized environments in Nokia Networks
Rick Lane
27-02-2019

© 2018 Nokia3 © 2018 Nokia3
CMDB - MariaDB
Common Software Foundation (CSF)
Component MariaDB (CMDB)

© 2018 Nokia4
Helm/Kubernetes/Container tradeoffs
Pros
• Fully separates services from kernel/other services
• Extremely light-weight and portable
• Containers disposable
 Kill and recreate pod as new
 Readiness/Liveless probes automate recovery
• Deploy application with multiple services in one
command/click (helm umbrella charts)
• Deployment time significantly faster than VM/ansible (4
minutes compared to 40 minutes)
Cons
• Containers disposable
 Recreated with new IP (looks like new server)
 Failure root cause difficult - logs disappear with
container. (pod stdout or persistent storage)
• Umbrella charts introduce other difficult problems
 Can deploy new service instance with helm
upgrade of parent chart

© 2018 Nokia5
Nokia Container Management Service
Helm/Kubernetes Deployment Model
controller
worker
worker
worker
worker
worker
controller
edge
edge
deploy
chart
cmdb
helm chart
deploy pods
External
Connections

© 2018 Nokia6
Security / Affinity
Helm/Kubernetes Deployment Model
Security
• RBAC fully supported
• All containers must run as non-root user
• Kubernetes RBAC ServiceAccount and Role/RoleBindings limit container privileges
• Password security
• All user-supplied passwords loaded to kubernetes secret during pre-install-job
• Secret used to propagate passwords to maxscale/mariadb pods
• Password secret deleted on post-terminate
• User must provide secret with old/new password to update passwords
Affinity
• podAntiAffinity
• hard (default) – all pods must be scheduled on separate nodes or deployment will fail
• soft – try to schedule pods on separate nodes, but if will deploy anyway
• nodeAffinity
• mariadb pods forced to deploy on worker nodes
• maxscale pods by default deploy on edge nodes (can configure to deploy on worker)

© 2018 Nokia7
Containers
CMDB - MariaDB
cmdb/mariadb (FROM centos-7.6 os base image)
MariaDB database container supporting deployment all configurations (simplex, Galera, Master/Master, Master/Slave)
• MariaDB-10.3.11 (client, server, backup, etc)
• Galera
• SDC/etcd client RPMs
• CSF CMDB deployment, configuration and management RPMs
cmdb/maxscale (FROM centos-7.6 os base image)
MaxScale proxy container supporting deployment of data center configuration
• Maxscale-2.2.19
• CSF CMDB deployment, configuration and management RPMs
cmdb/admin (FROM kubectl base image)
Kubernetes/Helm Job Administration container supporting all life cycle events (install, upgrade, delete, etc)
• MariaDB-10.3.11-client
• Python job orchestrator and python classes to implement configuration specific job tasks

© 2018 Nokia8
Helm chart (services and admin)
CMDB - MariaDB
## Image Registry
global:
registry: "csf-docker-delivered.repo.internal.nokia.com"
registry1: "registry1-docker-io.repo.internal.nokia.com"
rbac_enabled: true
nodeAntiAffinity: hard
cluster_name: "my-cluster“
## Topology master-slave, master-master, galera, simplex
cluster_type: “master-slave“
## Values on how to expose services
## ClusterIP will expose only within cluister, NodePort to expose externally
services:
## MySQL service exposes the mysql database service (mariadb or maxscale)
mysql:
type: ClusterIP
## MariaDB Master exposes the pod that is master
mariadb_master:
type: NodePort
## Maxscale exposes the administrative interface of Maxscale
maxscale:
type: NodePort
## Maxctrl (optional) exposes the maxctrl administrative interface of Maxscale
maxctrl:
enabled: false
type: ClusterIP
port: 8989
admin:
image:
name: "cmdb/admin"
tag: "4.5-1"
pullPolicy: IfNotPresent
## A recovery flag. If changed, will trigger a heal of the database to occur
#recovery: none
quickInstall: ""
## If set, administrative jobs will be more verbose to stdout (kubectl logs)
debug: false
## Exposes the hook-delete-policy. By default, this is set to delete the
## hooks only upon success. In helm v2.9+, this should be set to
## before-hook-creation. This can also be unset to avoid hook deletion
## for troubleshooting and debugging purposes
hook_delete_policy: "hook-succeeded"

© 2018 Nokia9
Helm chart (mariadb and maxscale)
CMDB - MariaDB
mariadb:
image:
name: "cmdb/mariadb"
tag: "4.5-1"
## The number of MariaDB pods to create
count: 3
heuristic_recover: rollback
use_tls: true
## Enable persistence using Persistent Volume Claims
persistence:
enabled: true
accessMode: ReadWriteOnce
size: 20Gi
storageClass: ""
resourcePolicy: delete
preserve_pvc: false
## MariaDB server customized configuration
mysqld_site_conf: |-
[mysqld]
userstat = on
## metrics
metrics:
enabled: false
## Grafana dashboard
dashboard:
enabled: false
maxscale:
image:
name: "cmdb/maxscale"
tag: "4.5-1”
## The number of MaxScale pods
count: 2
## MaxScale customized configuration
maxscale_site_conf: |-
[maxscale]
threads = 2
query_retries = 2
query_retry_timeout = 10
[MariaDB-Monitor]
monitor_interval = 1000
failcount = 4
## MaxScale promotion/demotion SQL
sql:
## Mariadb Node promoted to master
promotion: []
## Mariadb Node demoted to slave
demotion: []
## leader-elector
elector:
image:
name: "googlecontainer/leader-elector"
tag: 0.5

© 2018 Nokia10
Events
Life Cycle Management
• Kubernetes native events
 install = deploy chart and create resources
 delete = terminate chart and delete all resources created by install
 upgrade = make any changes to mariadb/maxscale resources (configuration, etc)
special code to handle heal and scale-in/out events
• Nokia plugin events
 heal = implemented also with kubernetes upgrade admin.recovery value
 scale-in/scale-out = implemented also with kubernetes upgrade mariadb.count or maxscale.count
 backup/restore = implemented with Backup/Restore policy

© 2018 Nokia11
Kubernetes Resources
Galera Cluster
• Deploy mariadb-statefulset with 3+ (odd number). MariaDB pod contains:
• Mariadb container
 Configures mariadb in Galera configuration automatically at deploy based on IP advertisements
 If pod restarts, configured to always come back to Join existing cluster
 Persistent Volume Claim mounted for database storage
• Backup/Restore container for scheduling routine mariadb container backups
• Optional mysqld_exporter container for metrics collection (if metrics enabled)
• Mysql Service created to provide access to all DB nodes (all DB nodes added to service as endpoints)
• Metrics Service created to provide access to DB nodes from Grafana dashboard (if metrics enabled)

© 2018 Nokia12
Galera Cluster
mysql
load-balance
pods
service
mariadb
metrics
mariadb-0
BR
volume
mariadb
metrics
mariadb-1
BR
volume
mariadb
metrics
mariadb-2
BR
volume

© 2018 Nokia13
Kubernetes Resources
Master/Slave with HA MaxScale
• Deploy maxscale-statefulset with 1 to 3 pods. Maxscale pods contains:
• Maxscale container
 Configures maxscale using helm values and mariadb container advertised IPs (via etcd)
 Monitors http://localhost:4040 for leader-elector changes (setting maxscale passive mode)
• Leader-elector container for managing HA
 Configured to manage kubernetes endpoint with lease for election of leader in cluster
 Starts small web server to publish elected leader to port 4040
• Deploy mariadb-statefulset with 2+ (3+ odd number preferred). MariaDB pod contains:
• Mariadb container
 Configures mariadb in Master/Slave/Slave configuration automatically at deploy based on IP advertisements
 If pod restarts, configured to always come back as a Slave
 Persistent Volume Claim mounted for database storage
• Backup/Restore container for scheduling routine mariadb container backups
• Optional mysqld_exporter container for metrics collection (if metrics enabled)
• Mysql Service created to provide access to all maxscale nodes (all maxscale nodes added to service as endpoints)
• Maxctrl Service created to provide REST API access to “active” maxscale node (labeled with ‘maxscale-leader’)

© 2018 Nokia14
mariadb
metrics
mariadb-0
BR
volume
Master/Slave Cluster with HA MaxScale
maxscale
elector
maxscale-0
maxscale
elector
maxscale-1
mysql
service
maxctrl
maxscale-1
endpoint
watching
managing
Master
Slave
Slave
passive
active
load-balance
mariadb
metrics
mariadb-2
BR
volume
mariadb
metrics
mariadb-1
BR
volume

© 2018 Nokia15
Pod IP Advertisements / Single Pod Failure
Pod failures result in the re-created pod being re-deployed with a new IP address (looks like a new cluster server)
mariadb-0
etcd server
cmdb/my-cluster/services/attributes/mariadb-0 = {“role”: “RM”, “ip”: “172.16.0.35”}
cmdb/my-cluster/services/attributes/mariadb-1 = {“role”: “RS”, “ip”: “172.16.0.104”}
cmdb/my-cluster/services/attributes/maxscale-0 = {“role”: “MXS”, “ip”: “172.16.0.39”}
cmdb/my-cluster/services/attributes/maxscale-1 = {“role”: “MXS”, “ip”: “172.16.0.52”}
mariadb-1
mariadb-2
maxscale-0 maxscale-1
172.16.0.35
172.16.0.104
172.16.0.97
172.16.0.39 172.16.0.52
172.16.0.201
mariadb-2
maxadmin alter server mariadb-2 address=172.16.0.201
Advertise IP
Audit advertisements

© 2018 Nokia16
Galera Cluster Heal
mariadb-0 mariadb-1 mariadb-2
etcd server
cmdb/my-cluster/mariadb-0/config/role = “--cluster=join:SST”
cmdb/my-cluster/mariadb-1/config/role = “--cluster=new”
admin
post-upgrade-job
Admin container heal operation (helm upgrade of admin.recovery value)
etcd server
cmdb/my-cluster/actions/wait_role = {“advertise”: “recovery_pos”}
cmdb/my-cluster/services/recovery_pos/mariadb-0 = {“seqno”: “527”}
(1) Write wait_role action
(2) Kill all mariadb pods
(3) Pods advertise recovery_pos seqno values
(4) Find pod with largest seqno
(5) Largest pod starts cluster, rest join
(6) Pods detect role and re-deploy

© 2018 Nokia17
Galera Cluster Scale-Out
etcd server
cmdb/my-cluster/mariadb-4/config/role = “--cluster=join:SST”admin
pre-upgrade-job
Admin container scale-out operation (helm upgrade of mariadb.count)
mariadb-3 mariadb-4
admin
post-upgrade-job
(1) Set new pods roles
(2) New pods created
(3) Notify existing pods of new cluster size

© 2018 Nokia18
Galera Cluster Scale-In
admin
pre-upgrade-job
Admin container scale-in operation (helm upgrade of mariadb.count)
mariadb-3 mariadb-4
admin
post-upgrade-job
(1) Verify new cluster size
(2) pods deleted
(3) Notify remaining pods of new cluster size

© 2018 Nokia19
MaxScale Cluster Heal
mariadb-0
Maxscale will auto-heal MariaDB cluster
when all database pods fail
mariadb-0 mariadb-0
Remote
Data
Center
Master
SlaveSlave
Topology Audit (no audit if event < 15 seconds)
• After all pods restart:
o Original master will be replicating from remote DC
(Slave, Running)
o Original slaves will still be replicating from old master
(Running)
• Expected_master = first server replicating to remote DC
• If all other servers replicating to same server (old master)
For all servers (except expected_master)
CHANGE MASTER TO expected_master
Run promotion.sql

© 2018 Nokia20
MaxScale Cluster Scale-Out
etcd server
cmdb/my-cluster/mariadb-3/config/role = “--replicate=slave”
cmdb/my-cluster/mariadb-4/config/role = “--replicate=slave”
admin
pre-upgrade-job
Admin container scale-out operation (helm upgrade of mariadb.count)
etcd server
cmdb/my-cluster/actions/wait_role = {“advertise”: “ready”}
cmdb/my-cluster/services/ready/mariadb-3 = ‘true’
cmdb/my-cluster/services/ready/mariadb-4 = ‘true’
(1) Make sure master exists (2) Write wait_role action
(3) New pods created
(5) As ready pods detected, restore
from master backup and advertise
pod role
mariadb-0 mariadb-1 mariadb-2 mariadb-3 mariadb-4
admin
post-upgrade-job
(4) Backup Master (0)
M
(6) Notify existing pods of new cluster size Maxscale: maxadmin create server <server> …
maxadmin add server <server>

© 2018 Nokia21
MaxScale Cluster Scale-In
admin
pre-upgrade-job
Admin container scale-in operation (helm upgrade of mariadb.count)
mariadb-3 mariadb-4
admin
post-upgrade-job
(1) Verify new cluster size
(3) pods deleted
(4) Notify remaining pods of new cluster size
MM
(2) Switchover Master via MaxScale if necessary
Maxscale: maxadmin remove server <server>
maxadmin destroy server <server>

© 2018 Nokia22
Future Work
• Additional enhancements to prevent data loss
 Supporting semi-sync replication in Master/Slave/Slave cluster with MaxScale
 Implement preStop hook to trigger switchover if Master is being deleted (eg, for migration)
• Kubernetes Horizontal Pod Autoscaling (HPA)

Deploying MariaDB databases with containers at Nokia Networks

Deploying MariaDB databases with containers at Nokia Networks

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Deploying MariaDB databases with containers at Nokia Networks

Similar a Deploying MariaDB databases with containers at Nokia Networks (20)

Más de MariaDB plc

Más de MariaDB plc (20)

Último

Último (20)

Deploying MariaDB databases with containers at Nokia Networks