Kubernetes Operators allow you to create custom resources in Kubernetes. They are popular for managing databases, which tend to be complex to manage. Our team built an operator to stand up ClickHouse, a popular open source data warehouse, in Kubernetes clusters. We'll share major learnings from this experience which we feel are applicable generally to running scalable, high performance databases in this environment. The talk starts with a level-set of Kubernetes, ClickHouse, and what an operator does. We'll then jump into the design of the ClickHouse operator example, covering challenges associated with the following problems:* Reducing the complexity of Kubernetes through definition of new resources for databases* Defining and managing storage* Performance, including comparative results which look pretty good* Monitoring* Upgrade and configuration changesKubernetes is not free from challenges, and we'll cover these as we touch on each point above. We'll conclude with a summary of reasons that we think Kubernetes is a great environment for data warehouses, based on our experience to date.
2. Brief Intros
www.altinity.com
Leading software and services
provider for ClickHouse
Major committer and community
sponsor in US and Western Europe
Robert Hodges - Altinity CEO
30+ years on DBMS plus
virtualization and security.
ClickHouse is DBMS #20
3. Why run data warehouse on Kubernetes?
1. Same environment as other cloud native services
2. Portability
3. Fast deployment cycles
4. Flexible mapping to resources
5. Introduction to ClickHouse
Understands SQL
Runs on bare metal to cloud
Shared nothing architecture
Stores data in columns
Parallel and vectorized execution
Scales to many petabytes
Is Open source (Apache 2.0)
a b c d
a b c d
a b c d
a b c d
6. ClickHouse structure is optimized for speed
Table
Part
Index Columns
Indexed
Sorted
Compressed
Part
Index Columns
Part
9. “Kubernetes is the new Linux”
Actually it’s an open-source platform to:
● manage container-based systems
● build distributed applications declaratively
● allocate machine resources efficiently
● automate application deployment
10. A typical distributed service
Load
Balancer
Service
#1
Service
#3
Service
#2
Storage
Storage
Storage
Traffic
11. Defined using Kubernetes resources
Pod
“svc-1”
Persistent
Volume
Service
“svc”
Stateful
Set
Persistent
Volume
Claim
Persistent
Volume
Persistent
Volume
Pod
“svc-2”
Pod
“svc-2”
Persistent
Volume
Claim
Persistent
Volume
Claim
Config
Maps
SecretsConfig
Maps
Secrets
14. ClickHouse on Kubernetes is complex!
Zookeeper
Services
Zookeeper-0
Zookeeper-2
Zookeeper-1Shard 1 Replica 1
Replica
Service
Load
Balancer
Service
Shard 1 Replica 2
Shard 2 Replica 1
Shard 2 Replica 2
Replica
Service
Replica
Service
Replica
Service
User Config Map Common Config Map
Stateful
Set
Pod
Persistent
Volume
Claim
Persistent
Volume
Per-replica Config Map
15. Operators encapsulate complex deployments
kube-system namespace
ClickHouse
Operator
your-favorite namespace
Apache 2.0 source,
distributed as Docker
imageSingle specification
Best practice deployment
ClickHouse
Resource
Definition
17. Basic data warehouse topology
apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
name: "ch01"
spec:
configuration:
clusters:
- name: replicated
layout:
shardsCount: 2
replicasCount: 2
zookeeper:
nodes:
- host: zookeeper.zk
Name used to identify all resources
Definition of cluster
Location of service we depend on
18. Simplicity requires defaults
defaults:
templates:
volumeClaimTemplate: persistent
podTemplate: clickhouse:19.6
serviceTemplate: minikube
templates:
volumeClaimTemplates:
- name: persistent
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
Name of template
Storage misconfigurations
lead to insidious errors
19. Templates can be simple, too
defaults:
templates:
volumeClaimTemplate: persistent
podTemplate: clickhouse:19.6
serviceTemplate: minikube
templates:
podTemplates:
- name: clickhouse:19.6
spec:
containers:
- name: clickhouse-pod
image: yandex/clickhouse-server:19.6.2.11
Name of template
Most values take
defaults
21. Versatile mapping to different deployments
ClickHouse
Resource
Definition
Pod
Load
Balance
PodPod
Pod Pod
Load
BalanceLoad
Balance
Load
BalanceLoad
Balance
Pod Pod
Load
BalanceLoad
Balance
Pod Pod
Minikube Multi-AZ Deployment
(Differences mostly
in templates)
22. Changes are recognized automatically
defaults:
templates:
volumeClaimTemplate: persistent
podTemplate: clickhouse:19.11
serviceTemplate: minikube
templates:
podTemplates:
- name: clickhouse:19.11
spec:
containers:
- name: clickhouse-pod
image: yandex/clickhouse-server:19.11.3.11
Make new version
the default
Define template
for new version
23. Upgrade runs while service is online
Pod
chi-0-0
Update resource definition
ClickHouse
Operator
Apply Pod
chi-0-1
Pod
chi-1-1
Pod
chi-1-0
Compare resource
to actual state
Upgrade pods sequentially
ClickHouse
Resource
Definition
27. Pod
chi-0-1
Surprise! DNS is different in Kubernetes
Pod
chi-1-1 Pod
chi-0-1
Pod
chi-1-0
Pod
chi-0-0
DNS DNS
DNS
Restart
Pod restart invalidates
cluster DNS mappings
Core DNS
Server
Name resolution
deadlock at startup
Must resolve
host name
to start up
Won’t resolve
host until
pod starts
28. Kubernetes overhead is minimal (whew!)
Cluster deploy and load Query Comparison
Redshift dc2.large vs. Kubernetes EC2 r5.xlarge with EBS (st1)
29. No surprise: error handling is complicated
ClickHouse
Operator
ClickHouse
Resource
Definition
Complex
specification
Kubernetes
Storage
Provider
Asynchronous
execution
Local
semantics
30. Biggest challenge
Data warehouses are not cattle
Losing/compromising data is bad
Safety is paramount
Security, migration, availability require logic
above level of the operator
31. Biggest gain
Kubernetes democratizes data
warehouse access
Set up complex configurations in minutes
Map data warehouse flexibly to resources
Integrate easily with other services
33. Conclusions
● Kubernetes operators set up DW from single specification
● ClickHouse experience validates Kubernetes value:
○ Every application can have a data warehouse!
○ Portable
○ Fast deployment
○ Flexible resource management
● Kubernetes operator alone is not enough for all use cases
34. Future Work
● Data warehouse as a service on Kubernetes
○ Multi-tenancy
○ Data availability
○ Security
○ Optimized resource utilization
● Extend ClickHouse to match cloud native execution model
○ Decouple storage and compute
○ Rebalance data on scale-up/down