The document discusses an agenda for a meetup about Redis Labs and Kubernetes operators. Key points:
- An introduction to Redis Enterprise architecture and Redis Labs products.
- A discussion of "double orchestration" using Kubernetes and PKS to manage Redis clusters for performance and resource management.
- An overview of Redis Labs' Kubernetes solution using StatefulSets, services, and a custom controller.
- An introduction to operators, how they provide lifecycle management and simplify deployments compared to static YAML files or Helm.
- Details on Redis Labs' operator development process and challenges in building idempotent APIs and handling state changes and validation in the reconciliation loop.
4. Introduction to Redis Enterprise
Open source. The leading in-memory database platform,
supporting any high performance operational, analytics or
hybrid use case.
The open source home and commercial provider of Redis
Enterprise (Redise
) technology, platform, products & services.
7. Redise
- Open Source & Proprietary Technology
Redise
Node
Cluster Manager
Redise
Cluster
• Shared nothing, symmetrical cluster
architecture
• Fully compatible with open source
commands & data structures
Enterprise Layer
Open Source Layer
REST API
Zero latency proxy
8. • Faster time to market with continuity between
dev/test and production environments that use
Redise
Pack
• Highly available, easier to scale, simpler to manage
Redis technology, integrated with orchestration tools
such as PCF, Kubernetes, Mesosphere...
• Node in a container approach — All Redise
services
inside each container.
Run Redise
clusters on single or multiple nodes
Redise
in Containers
9. Node in a Pod Approach
Node 1
Vs
Node 2 Node 3 Node 1 Node 2 Node 3
One pod, multiple services per nodeMultiple pods, multiple services per node
13. Why like this?
• Resource management - Orchestration platforms are
designed to be generic.
• Again - Performance is king.
• Last but not least, it allows us to maintain a common
architecture - regardless of running environment, be it bare
metal, VM, K8s, Pivotal Cloud Foundry.
(.… Surprisingly enough, not everybody in the world uses containers…)
14. Who Does What
• Node auto-healing
• Node scaling
• Failover & scaling
• Configuration & monitoring
• Service discovery
• Upgrade
+
15. And specifically on Kubernetes
Node in a
Pod
Statefulset
Persistent
Volumes
Custom
Controller
Services
Manager
17. StatefulSet
Our cluster nodes are deployed as part of a statefulset
Affinity
Allows us to control the Redislabs cluster nodes topology
Redislabs Service Manager
Create/Update/Delete service entries for each Redis DB hosted on the cluster
RBAC
The Service Manager must have permissions to access the namespace to create services
Ingress
Allow access to Redis DBs from outside of the k8s cluster
17
Redis Labs on Kubernetes - Building Blocks
18. StatefulSet
• Introduced in 1.5, GAed in 1.9
• Statefulset Pod consistency
– Pod naming
– Scale-out/Scale-in
– Pod Upgrade
• Persistent Disks
– Same PVC will be used when Pod is (re)scheduled
• All Pods are uniform
• Recovery from error state
pod-0 pod-1 pod-2
PV PV PV
19. Pod features
• Anti-affinity
– Allows us to control where the pods are being scheduled
• Readiness Probes
– Allows us to control the action flows to avoid data loss
• Pre-stop hook
– Drain the node and move resources to a different node
20. Why?
• Redis Enterprise is a multi-tenant Redis cluster
• Redis Enterprise Database can have 1 or more network endpoints
Problem
• Expose databases as a service instance
Solution
• Python based application that will: create, delete or update necessary database
service entries
• Based on an idempotent reconciliation loop
Redis Enterprise Services Manager
23. • Provide a solid primary db solution for end-users
• Stateful application
– Some changes cannot be performed
– Some changes need to mutate the state before applying the actual change
– Data-loss is unacceptable
• Support multiple k8s deployments
– Cloud: GKE, AWS, etc
– Openshift
– PKS
– Vanilla
– On-prem hardware vendor
• Ingress
• Packaging
Redis Labs Challenges
24. • Started out with 9 static yaml files
– Hard to deploy
– Hard to maintain
– Hard to distribute
– No control over the deployment life-cycle
• Helm
– Customized deployment
– Easier to maintain
– Not fully supported everywhere
– No control over the deployment life-cycle
• Operator
– Simple deployment (2 yaml files)
– Full control over life-cycle
– K8s compatible
Our journey
.yaml
.yaml
31. ● Life Cycle Control
○ Scale Up → Add new pod, Rebalance Data
○ Healing → Restore Backups, Auto Recovery
○ Backup
○ Validations (ex. even # pods)
● Configuration
○ Automate complex deployments (ex. Vault cluster and etcd cluster)
○ Reconfiguration
○ Agnostic configuration (ex. PVC by cloud provider)
● 3rd party resource (ex. prometheus)
● Cross distribution
● Easy to deploy
Why are operators useful?
32. 32
Our Upgrade Flow
In a Redis Enterprise Cluster we need to:
1. Drain pod
2. Stop pod
3. Start new pod
● Downgrade - not supported (oss backward compatibility)
33. Our Upgrade Flow
With Yaml/Helm -
We used a life cycle preStop hook of a stateful set
1. Encoded inside the yaml - cumbersome
2. Cannot validate version
3. No error handling
With Operator -
1. Maintain logic in code not in a config file
2. Validations: not a downgrade, cluster is not already in an upgrade process
3. Error handling
4. Manage canary deployment
40. • Started by CoreOS
– CoreOS pioneered by creating a few Operators (Prometheus & vault)
• Operator SDK:
minimize boilerplate and help developers to get started writing Operators
• The Basic API:
– Register Watchers on any Resource
– Create/Read/Update/Delete/Get on any resource
– Register schemas using k8s GO api
• Operator Lifecycle Manager
41. 41
The Reconciliation/Control Loop
• Called for every update, delete or creation on the watched resources
– No way of knowing what type of event except Delete
• Called every X seconds to “resync” resources
• Our responsibility is to allow the user to use our resource as any other in k8s
– AKA idempotent API
• Every call to handle we get our watched resources, we need to determine what to
do exactly
42. 42
Idempotent APIs
Desired State = Current Resource Current State
• Aggregation of deployed
resources
• Internal application
state
43. 43
The Reconciliation/Control Loop - Challenges
• Determine which changes need to happen
• Determine if the change is valid
• K8s doesn’t provide a solid validation before applying changes to CR
– 1.9 has a beta feature for OpenAPI validations
• Long running processes as part of a resource change
44. Pending
Creation
Running
Invalid
Error
create
create
apply
create
apply
apply
Pending Creation - initial state where cluster is not deployed yet
Running - Cluster Deployed and is either running or starting to run and not ready yet
Invalid - Invalid configuration was requested. E.g. even #nodes. Until a valid configuration is applied the status will remain invalid
Error - Error when trying to deploy or update the Redis Enterprise Cluster
apply
Redis Cluster Status
applyapply
create = kubectl create -f cr.yaml
apply = kubectl apply -f cr.yaml
45. 45
Development Challenges
• Deep understanding of how Kubernetes works (statefulsets, controller, APIs)
• Workflows - Idempotent APIs are challenging due to state mutation
• Double Orchestration - Adds a level of complexity compared to stateless
deployments
• Various SDK issues