Being able to observe the state of a running application is key to understanding a system's behaviour and essential if you want to fix production problems quickly and efficiently. Like a lot of other things, this is harder to do in distributed systems than it is with a monolith. At Poppulo we've been running a distributed system of hundreds of microservices in production for more than 4 years and we got to understand how critical this visibility is. If you want to succeed with operating a distributed system, observability should be an integral part of system design. I'll cover key techniques to build a clearer picture of distributed applications in production, including details on useful health checks, best practices for instrumentation with metrics, logging and tracing.