A small walkthrough of projects within the dutch government running Data(bases) on OpenShift. This talk shares success stories, provides a proven recipe to `get it done` and debunks some of the FUD.
About Sebastiaan:
I have always been a weird DBA, trying to combine Databases with out-of-the-box thinking and a DevOps mindset. Around 2016 I fell in love with both Postgres and Kubernetes, and I then committed my life to enabling Dutch organisations with running their Database workloads CloudNative.
Over the last few years I worked as a private contractor for 2 large government agencies doing exactly that, and I want to share my and others (success stories) hoping to enable and inspire Data on Kubernetes adoption.
2. Subjects of today
● Short introduction
● Break the glass
● Database performance on CEPH
● Some other ideas
● General recommendations
● Conclusion and takeaways
4. Introduction
● Database space since 2000, CloudNative
enthusiast since 2015
● Worked for DJI, bol.com, EDB, RIVM,
MannemSolutions
● Contributor
○ Since ever
○ pg_hba, Stolon, wal-g, bitbucket-cli, and more
○ pgfga, pgroute66, pgquartz, and more
● Dreamer
5. Mission (Mannem Solutions)
Enable organizations to be successful
with modern Open Source Data
Solutions.
Gov and Private 100% Open Source!!!
Early adopter
Specialized in Data
8. Authentication - LDAP is not very CloudNative
Cloud native technologies empower organizations to build and run scalable applications in modern, dynamic
environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable
infrastructure, and declarative APIs exemplify this approach.
These techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with
robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil.
The Cloud Native Computing Foundation seeks to drive adoption of this paradigm by fostering and sustaining an ecosystem
of open source, vendor-neutral projects. We democratize state-of-the-art patterns to make these innovations accessible for
everyone.
https://github.com/cncf/toc/blob/main/DEFINITION.md
10. Authentication - Natural drift from CN
CPU, Memory, storage, connectivity
(Kubernetes)
Projects
DB1
App
Federated
Authentication
1: Normal operation:
Only app connects
3:Fix Issue:
New release of App!!!
2: Investigating issues:
Humans connect
Humans can always
connect
=
remove the need to fix
with App
Seems like too much
hassle for exception
resolution
11. Authentication
Lets do it all with client certificates:
● Specific for project
● Not relying on external components
● Manageable with operators (Very CN)
● Short lived (Ideal for Break The Glass)
● Decoupled
○ Private key not required for verifying the cert
○ Handing out cert does not require Postgres access
13. Authentication - Break the glass is very CloudNative
Cloud native technologies empower organizations to build and run scalable applications in modern, dynamic
environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable
infrastructure, and declarative APIs exemplify this approach.
These techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with
robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil.
The Cloud Native Computing Foundation seeks to drive adoption of this paradigm by fostering and sustaining an ecosystem
of open source, vendor-neutral projects. We democratize state-of-the-art patterns to make these innovations accessible for
everyone.
https://github.com/cncf/toc/blob/main/DEFINITION.md
18. Lets test - What’s going on?
cpu: 2
memory: 1GiB
scale: 100
clients: 80
threads: 10
Let’s do some more tests:
● Separate disk for WAL (which you should always do)
● Add resources
● Improve storage performance (the wrong way)
31. Dutch Gov
(On Premise)
Why? - Some background
Classic
VMWare / VMDK SAN storage
CloudNative
OpenShift / CEPH
DB Performance
(TPS)
Storage Performance
(fsync latency)
32. Filesystem
● stream based
● FS: mostly write
● write: transaction inline
● block based and memory cached
● FS: more read
● write: direct in memory
checkpoint in the background
Why? Is it data or WAL?
Database
data log
Storage
sync
ΔT: latency
Data Apply log
Memory
33. ● Mostly read
○ and ingest write
● In memory > checkpoint
(background process)
● Block based
● Mostly write
○ ingest write also in log
● Update is inline for TPS
● stream based
Data
Apply log
Storage performance (On premise)
Storage performance
distribution centre
Assembly line
34. Filesystem
● stream based
● FS: mostly write
● write: commit
(transaction
inline)
● block based and
memory cached
● FS: more read
● write: direct in
memory,
checkpoint in the
background
Why? Ceph vs DAS
data log
Storage
sync
ΔT: latency
Data Apply log
Filesystem
SSD
SCSI-BUS
Filesystem
SSD
kernel/driver
kernel/driver
tcp network
node software
SCSI-BUS
driver/kernel
SSD
node software
SCSI-BUS
driver/kernel
Classic - DAS:
CloudNative - CEPH:
Memory
Lower latency?
Higher latency?
35. Conclusion - CloudNativePG on CEPH
● Separating WAL does not help
Probably due to all sync IO on WAL volume
● Increasing CPU and Memory has no affect.
Storage IOPS is the limiting factor?
● Disabling fsync increases performance
Yes, Storage IOPS is the limiting factor!!!
36. APP
Why is 1300 enough?
FS
DB
data log
SAN
Memory
FS
DB1
data log
CEPH
Memory
APP
microservice
FS
DB2
data log
CEPH
Memory
microservice
FS
DB3
data log
CEPH
Memory
microservice
Classic:
MicroService:
38. How can we fix it?
Introduce store with faster fsync performance
● We could still run on VM
But wait…
● Our OpenShift clusters run on VMWare.
Can’t we use VMDK (Tanzu/vSphere CNS)?
So, I talked to the architect, and
43. So let’s build a tool
● How many CPU presentation
○ rust (and go), multithreaded
○ only tps, #clients was fixed, no ssl
○ just getting started to program in rust
● pg_tps_optimizer
○ rust, multithreaded
○ tps and latency
○ #clients increase (fibonacci)
○ ssl and client certs support
○ still just getting started to program in
rust, but neovim / primeagen
○ Work in progress, but not there yet :(
https://www.postgresql.eu/events/pgconfeu2019/sessions/session/2797/slides/199/2019%20How%20much%20CPU%2060.000%20TPS%20(PGCONF).pdf
https://github.com/MannemSolutions/pg_tps_optimizer
44. Some other issues and
solutions
Issues we ran into and how we fix them
45. Other examples / ideas
● bitbucket-cli
● Project Quota
○ Image with oc, tkn, etc.
238MB, 210 vulnerabilities (2 critical, 16 High)
○ BYO: 13.69MB, 0 vulnerabilities
● Break the glass functionality
○ Temporary access, with proper auditing
● Pipeline runner
○ read pipeline definition > set params from environment > start it
○ More flexible, smaller image, 0 vulnerabilities
● Image puller
○ Pull image with all tags, image LCM
○ Easy to configure
46. I wanna invite everyone
to join our adventure
Great, how?
47. Think CloudNative
Cloud native technologies empower organizations to build and run scalable applications in modern, dynamic
environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable
infrastructure, and declarative APIs exemplify this approach.
These techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with
robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil.
The Cloud Native Computing Foundation seeks to drive adoption of this paradigm by fostering and sustaining an ecosystem
of open source, vendor-neutral projects. We democratize state-of-the-art patterns to make these innovations accessible for
everyone.
48. Suggestion: Open Source vs Tender
● POC with Open-Source
● Acquire support
○ If you need it
○ But only if you built it
● Only acquire support for Open
Source
● GPL is fine
○ Unless you want to change, not
upstream your changes, but do want to
redistribute (you naughty boy)
49. DYI vs Support
1. Make it work
○ Use Open Source
○ Investigate multiple options
2. Make it right
○ Do you need support?
○ Request support for your solution,
instead of a solution with support
3. Make it perform
○ Don’t overdo
■ aim for similar performance
■ Only as required (production=yes,
cicd=no, be smart)
○ Microservice architecture helps big time
○ Use the same storage as on original
architecture
51. Conclusions
Dutch government is embracing
CloudNative
Running CloudNative database on
OpenShift with Ceph
● is doable
● up to 1300TPS (maybe higher)
● In a CN environment that might be
enough
● If it isn’t, there are options
Key takeaways
● Think CloudNative
● Don’t fear to build your own
○ You will learn what you really need,
then acquire it, not before you know
● Open your Source
● Decide on
○ Expectations you have
○ Investments you are willing to put in:
■ Effort
■ Money if needed
53. Using Open Source helps
● POC with Open-Source
● Acquire support IF YOU NEED IT
● Get support on the Open Source
solution
● GPL is only an issue if you want to
change and redistribute without
contributing your changes
55. Performance
● Performance, latency, fsync
● Performance, microservices distribution
● Need support
● Air gapped
● The power of CICD
● Pets vs Cattle
● Disaster Recovery
56. The idea is to talk about Dutch Gov, Databases on K8s, and their challenges.
The them would be to think CloudNative, which is especially difficult in this combi:
● Databases are usually pets, but CN thought is Cattle
● Dutch gov usually have a more classical approach where CN is a more modern approach (I will leave this out, but it def is there)
So I wanted to touch the following subjects:
● storage performance, Pets approach: Can I create a huge database which requires gzillion TPS
○ Test what performance is achievable and you will be surprised
○ Baby steps, start with same storage as database VM's
○ CN/Cattle approach would be a divide and conquer approach
which def helps bringing down requirements and increases perf of total system
● Authentication
○ Don't think classical DBA that access all databases for manual tuning, etc.
App is SPOC for DB and as such dev on app is how to apply DB changes, but that requires access.
○ general approach would be to use federated auth like LDAP, but LDAP is not really CN approach
○ Introducing an alternate approach:
Break the glass option, Client certs as short lived auth tokens for short period DBA access where you need is
● Backups
○ Classic approach is 'backup everything'
That dogma is severely limiting. Think out of the box
■ Do you want DB's in CI/CD pipelines? Do they need backups?
■ Do you want short lived databases that run workload
(e.a. generate reports from raw data)? Do they need backups?
○ Make sure you can restore your data
■ emphasis on can: make sure it is an option, not a rule of thumb
■ emphasis on restore: 'dump and load',
but also 'rebuild from other datasources' or even generate new testset can be valid options too
● CMDB?
○ Option for short lived databases, so use existing inventories instead
57. Ideas
● Performance test
○ Azure VM, Azure K8s
○ In-House, openShift, CEPH
● Questionaire
Title: Implementing data and databases on K8s within the Dutch government.
Description: A small walkthrough of projects within the dutch government running Data(bases) on OpenShift. This talk
shares success stories, provides a proven recipe to `get it done` and debunks some of the FUD.
Introduction: I have always been a weird DBA, trying to combine Databases with out-of-the-box thinking and a
DevOps mindset. Around 2016 I fell in love with both Postgres and Kubernetes, and I then committed my life to
enabling Dutch organizations with running their Database workloads CloudNative. Over the last few years I worked as
a private contractor for 2 large government agencies doing exactly that, and I want to share my and others (success
stories) hoping to enable and inspire Data on Kubernetes adoption.
58. ● Why am I trying to get DoK adoption in Dutch Gov?
● Dutch government is adorable
○ Rogers: Laggards trying to become late majority
○ You are too late to have us approve your submission
○ Have we bought support?
○ Computable
● Data on Kubernetes Paradox
○ Data (stateful by nature)
○ On Kubernetes (Stateless by nature)
● Challenges
○ Multicluster
59. Who am I
● Masochistic by nature
○ I will shuffle the beehive if necessary
○ `Challenge accepted` mentality
● Out of the box thinking
○ In the box is BBBOOORRRIIINNNGGG
● `Just do it` mentality
○ I’m okay doing the heavy lifting if necessary
○ I’d rather do it then think about why we shouldn’t
● Ideology
○ Government is there for all of us, so let’s enable them
memes create with: https://imgflip.com/memegenerator
60. Who am I - Masochistic by nature
Shuffle the beehive if necessary
`Challenge accepted` mentality
memes create with: https://imgflip.com/memegenerator
64. Data On Kubernetes Paradox
Image downloaded from: https://forgottenbytheworld.blogspot.com/2012/02/abandoned-leaning-house.html
● Data is stateful by nature
● Kubernetes is Stateless by nature
How can you build on a non-solid
foundation???
66. Dutch government is adorable
● You are too late to have us approve
your submission
● Avoid computable at all costs
● Have we bought support?
● Average age of 58
Gov
67. Storage - do’s and don’ts
● Databases love fsync’able storage
○ fsync should be fast (short roundtrip)
○ fsync should be trustworthy (when fsync says it is ok, it must be ok)
● Do’s
○ Use block storage
○ Use same storage as on VM
■ VMWare Tanzu to have vmdk on k8s
■ Or also use CEPH on VM(if you also use it on k8s)
● Dont’s
○ Don’t use NFS!!!
○ Don’t use
68. Disaster recovery
K8s
Cluster 1
K8s
Cluster 2
Global Load Balancer
Primary cluster Replica cluster (DR)
Options:
● Run database outside of k8s
● Use native RDBMS capabilities
○ cloudnative-pg.io: replica-clusters
○ crunchy pgo: streaming standby