Developer experience is about reducing friction between creating an idea and delivering business value in production. It includes factors like lead time, deployment frequency, and monitoring. DevEx has three components: workflow, platforms, and the developer experience itself. The ideal workflow follows progressive delivery principles. Teams should focus first on automating the inner development loop and CI/CD processes, then on observability and best practices. Questions around local vs cluster development, verification approaches, and whether to provide guardrails can help guide platform design decisions.
3. @danielbryantuk
Independent Technical Consultant, Product Architect at Datawire
Previously: Academic, software developer (from startups to gov),
architect, consultant, CTO, trainer, conference tourist…
Leading change through technology and teams
5. Developer Experience (DevEx) is about...
“...reducing engineering friction between creating a hypothesis, to
delivering an observable experiment (or business value) in production”
- Adrian Trenaman (SVP Engineering, HBC)
https://www.infoq.com/news/2017/07/remove-friction-dev-ex
6. DevEx isn’t new, but it is important
● Lead time
● Deployment frequency
● Mean time to restore (MTTR)
● Change fail percentage
● Rapid provisioning
● Basic monitoring
● Rapid app deployment
https://martinfowler.com/bliki/MicroservicePrerequisites.html
16. Fundamental questions
Do you understand your domain?
Is your problem domain complex?
Do you have product/market fit?
Question
Is your solution event-driven (and simple)?
Should you be adding value elsewhere?
17. SOLID K8s: Open for Extension...
● Kubernetes becoming de facto CoaaS (the new cloud broker?)
○ Lots of hosted options
● Know the extension points
○ Custom Resources & Controllers
○ Operators
○ operatorhub.io (kudos to Red Hat)
● Extension enables custom workflow
○ “Kubernetes Custom Resource, Controller and Operator Development Tools”
22. Develop and test services locally, or
within the cluster (or both)?
● Working locally has many advantages
○ Reduce ops cost of multi-cluster
● However, some systems are simply too
large to run locally (for integration tests)
● Local/remote container dev tools like
Telepresence and Squash allow hybrid
Question
23. Develop and test services locally, or
within the cluster (or both)?
● Working locally has many advantages
○ Reduce ops cost of multi-cluster
● However, some systems are simply too
large to run locally (for integration tests)
● Local/remote container dev tools like
Telepresence and Squash allow hybrid
Question
25. How do want to verify your system?
● Pre-prod testing in distributed systems
○ Dealing with complex adaptive systems
○ Probabilistic guarantee of “correctness”
https://medium.com/@copyconstruct/testing-microservices-
the-sane-way-9bb31d158c16
Question
27. How do want to verify your system?
● Pre-prod testing in distributed systems
○ Dealing with complex adaptive systems
○ Probabilistic guarantee of “correctness”
https://medium.com/@copyconstruct/testing-microservices-
the-sane-way-9bb31d158c16
Question
● Traffic shaping/splitting is powerful
○ Canarying
○ Shadowing
29. The Importance of L7 (and Envoy)
● “Service-mesh all the things”?
● Old pattern, new technology
○ Allows fine-grained release
● Many control planes for Envoy
○ Ambassador
○ Gloo
○ Istio
○ Consul Connect
https://www.infoq.com/articles/ambassador-api-gateway-kubernetes
31. Canary gotchas (and mitigations)
● Observability is a prerequisite
○ Service Level Indicators (SLIs)
○ Service Level Objectives (SLOs)
○ Key Performance Indicators (KPIs)
● Needs high volume of diverse
(representative) traffic
● Take care with side effects
● Focus on “golden signals”
○ Latency, traffic, errors, saturation
○ Okay to initially “eyeball” data
○ Create actionable alerts
● Load test (with flag header)
● Run synthetic transactions
● Service virtualisation (Hoverfly)
33. Do you want to implement “guard rails”
for your development teams?
● Larger teams often want to provide
comprehensive guard rails
● Startups and SMEs may instead value
team independence
● Hybrid? Offer platform, but allow service
teams freedom and responsibility
https://blog.openshift.com/multiple-deployment-methods-openshift/
Question
38. Some thoughts on where to focus...
Prototype Production Mission Critical
Dev and test Local / hybrid Hybrid / local / staged Local / (hybrid) staged
Release Canary
(synthetic shadow)
Canary / pre-prod test Pre-prod test / Canary
Guide rails “YOLO” Limited Strong
Where to focus? Inner development
loop & CI/CD
Observability and
scaffolding (codifying
best practices)
Observability, debugging
and “recreatability”
(environment & data)
40. In Summary
The developer experience is primarily about minimising the friction between having
an idea, to dev/test, to release, to delivering observable business value
How you construct your ‘platform’ impacts the developer experience greatly
You must intentionally curate the experience of: local development, continuous
delivery, release control, observability, debuggability, and more...
How you construct your ‘platform’ impacts the developer experience greatly
...is minimising the distance between a good idea and production
Do you want to implement “guide rails” for your development teams?
Larger teams and enterprises often want to provide comprehensive guide rails for development teams; these constrain the workflow and toolset being used. Doing this has many advantages, such as the reduction of friction when moving engineers across projects, and the creation of integrated debug tooling and auditing is easier. The key trade-off is the limited flexibility associated with the establishment of workflows required for exceptional circumstances, such as when a project requires a custom build and deployment or differing test tooling. Red Hat’s OpenShift and Pivotal Cloud Foundry offer PaaS-es that are popular within many enterprise organizations.
Startups and small/medium enterprises (SMEs) may instead value team independence, where each team chooses the most appropriate workflow and developer tooling for them. My colleague, Rafael Schloming, has spoken about the associated benefits and challenges at QCon San Francisco: Patterns for Microservice Developer Workflows and Deployment. Teams embracing this approach often operate a Kubernetes cluster via a cloud vendor, such as Google’s GKE or Azure’s AKS, and utilize a combination of vendor services and open-source tooling.
A hybrid approach, such as that espoused by Netflix, is to provide a centralized platform team and approved/managed tooling, but allow any service team the freedom to implement their own workflow and associated tooling that they will also have the responsibility for managing. My summary of Yunong Xiao’s QCon New York talk provides more insight to the ideas: The “Paved Road” PaaS for Microservices at Netflix. This hybrid approach is the style we favor at Datawire, and we are building open-source tooling to support this.