My talk from the Kubernetes Boston Meetup on Feb 20, 2018. I talked about how CloudHealth is using Kubernetes both in production and in development to increase velocity without sacrificing quality.
Why Teams call analytics are critical to your entire business
Kubernetes: Increasing velocity without sacrificing quality
1. Using Kubernetes to increase
developer velocity
(without sacrificing quality)
Adam Schepis
Architect @ CloudHealth Technologies
2. About Me
● At CloudHealth I build things in the cloud
that help our customers to confidently
build things in the cloud.
● I love working on distributed systems with
high scalability requirements.
● I have met Spiderman.
@aschepis
5. Our Challenges
Innovation in the Cloud
● AWS - 100+ feature announcements in 2018
● Azure - 13 announcements (chunkier)
● GCP - Next '18 in July (more than 100
announcements last year)
6. ● Started our Kubernetes journey in early
2017
● Running a number of production workloads
● Kubernetes is a key component of platform
overhaul in 2018
CloudHealth
and Kubernetes
I'm adam
architect at cloudhealth
What gets me excited in the morning is building systems (often distributed) with high scalability requirements
We have grown (a lot!)
30-260 in 3 years
eng 10 -> 70
code "grew organically" with us
More devs + big, complex platform + tribal knowledge = a drag on velocity
Market has matured
QUALITY!
Our customers aren't early adopters any more
No tolerance for product or data quality issues
Innovation in the Cloud
VELOCITY!
100+ announcements in 6 weeks of 2018
Azure - 13 very chunk announcements
Hybrid Cloud/Datacenters
enterprise customers ask for this
cloud + datacenter will exist for the foreseeable future in large enterprises
International growth
Alibaba
supporting many currencies
Decided on k8s in early 2017
Evaluated ECS, Mesos, Docker Swarm
We run production workloads
for background analytics and batch jobs
for serving data in mainline customer requests in application
Kubernetes is one of the backbones of our platform overhaul strategy in 2018
We use helm (wrapped in some light custom tooling) for
managing service lifecycles
It has worked very well
canary deployments were a bit painful
i would love to talk to people who have done canaries via helm or use helm and do canary deploys
Linkerd for our service mesh
daemonset in k8s
teams don't have to worry about deploying sidecars
Platform team doesn't have to run around explaining why they should)
we get distributed tracing via zipkin telemeter
Romana
CNI
Originally used weave but had some issues as cluster approached 50 nodes
this may have been our inexperience
Romana has been beneficial for us since we are on AWS and it intelligently manages route tables for us, avoiding limitation imposed by AWS
Like pretty much everyone else we also use a whole bunch of other technologies for building, delivering, and monitoring services.
Dev Cluster
shared by engineering team
each engineer has a namespace and they can deploy
Golang
built for macOS, linux
light wrapper
enough to make faster, not so heavy that you can't see under the covers.
ch init
set up dev env
setup dev tools (kubectl, helm, ...)
minikube (not by default anymore)
Self-provision access to development/staging cluster through google auth
Adding service
http_proxy
No native support on Node. 😢
Reasons for failures
Unit test failure
integration test failure
performance regressions
contract validation failures
What can a dev do
Because the failing build still lives in a namespace a dev can inspect the running service, perform tests, etc