Everyone wants observability into their system, but find themselves with too many vendors and tools, each with its own API, SDK, agent and collectors.
In this talk I will present OpenTelemetry, an ambitious open source project with the promise of a unified framework for collecting observability data. With OpenTelemetry you could instrument your application in a vendor-agnostic way, and then analyze the telemetry data in your backend tool of choice, whether Prometheus, Jaeger, Zipkin, or others.
I will cover the current state of the various projects of OpenTelemetry (across programming languages, exporters, receivers, protocols), some of which not even GA yet, and provide useful guidance on how to get started with it.
7. 7
OpenTelemetry (a.k.a. OTel)
“OpenTelemetry is an observability framework –
software and tools that assist in generating and
capturing telemetry data from cloud-native software.”
Across Traces, Metrics, Logs
+ =
OPENCENSUS
8. 8
Second most active CNCF project
OpenTelemetry 2 66 30 3 22K
Source: CNCF Dev Stats
9. 9
OpenTelemetry
A unified set of vendor-agnostic APIs, SDKs and tools
for generating and collecting telemetry data, and
then exporting it to a variety of analysis tools.
Generate Emit Collect Process Export
APPLICATION COLLECTOR
OTLP OTLP
10. 10
OpenTelemetry
Cross-language requirements for all
OpenTelemetry implementations
API specification | SDK specification | Data specification
For traces, metrics and logs
Generate Emit Collect Process Export
APPLICATION COLLECTOR
12. 12
OpenTelemetry Collector
Generate Emit Collect Process Export
APPLICATION COLLECTOR
Exporters
Processors
Receivers
Jaeger
Zipkin
Receiver N
Fan
out
Processor 0 Processor N
Exporter N
Prometheus
Logz.io
13. 13
OpenTelemetry Protocol (OTLP)
Generate Emit
APPLICATION
Collect Process Export
OTLP OTLP
Transport: grpc and HTTP 1.1 | Encoding: protobuf | Telemetry data model
17. • Protocol and Collector
are experimental
• API, SDK are in draft stage
• Focusing first on
integration with existing
logging systems
• Log appenders are
under development in
many languages
• Protocol and API
Are stable
• SDK is in feature
freeze (soon Stable)
• Prototyped in Java,
.NET, and Python
• Collector is still
Experimental, incl.
Prometheus support
• API, SDK, Protocol,
Collector are stable
• Client libraries >v1.0 for Java,
Go, .Net, Python, C++,
JavaScript
• Working on Ruby, PHP, Erlang
• Long-term support,
backwards compatibility,
and dependency isolation
Experimental
(some specs still in draft)
Stable
(i.e. GA)
Expecting stability
by end of 2021
State of the signals
Logs
Metrics
Traces
18. How do I get started with OpenTelemetry?
?
Tracing
API
SDK
Collector Protocol
Receivers
Exporters
Java
GoLang
.Net
C++
Metrics Logging
How many tools does a company use (on average) to collect telemetry data from its systems? Logs, metrics, traces? App, infra?
organizations use 5-10 different tools to collect telemetry from their systems
You can reduce it to 1 standard unified platform
Here’s the story of otel
(think fluentd, filebeat, metricbeat, datadog, new relic, Prometheus, statsd, collectd…)
Logz provides cloud native observability platform that’s based on popular open source (elasticsearch, Prometheus, jaeger, opentelemetry…)
WE’RE RECRUITING. Pay us a visit
Advocate of open source software, open standards and communities
Organize the local CNCF chapter in Tel Aviv, monthly meetup
Run a podcast – OOtalks. on the finalists for Best DevOps Podcast Series in 2021 on the DevOps Dozen² Awards!
observability: the ability to understand the state of our system based on the telemetry data it emits
The vision: unified observability across different signal types (logs/metrics/traces) and across different sources (frontend code, backend code, open source tools, cloud services ...)
Gain unified observability across all of these signal types and sources
The reality is much more fragmented – we use many tools for our observability
each tool and each vendor has proprietary APIs and SDKs for instrumenting (datadog, splunk, zipkin, new relic, jaeger, ... client libraries)
and then also proprietary agents, daemons, collectors to collect and run aggregations, sampling, other processing
And protocol and data model to transmit the telemetry to analytics backend
Not only an operational headache and vendor lock-in issue
Creates tight coupling between telemetry collection and the telemetry storage and analysis backend
Makes it very very difficult to correlate data between and gain unified observability across these data silos
That’s what OpenTelemetry comes to solve.
AKA OTel
Across all the observability pillars: traces, metrics and logs. One framework to rule them all
it’s an incubating project under the CNCF, a merge of OpenTracing and OpenCensus
OTel adopted by all the major vendors, all the monitoring tools, cloud providers (AWS, Azure)
is the second most active CNCF project behind Kubernetes
Source: CNCF dev stats https://all.devstats.cncf.io/d/1/activity-repository-groups?orgId=1
Dive deeper into what otel provides us:
OpenTelemetry provides the libraries, agents, and other components that you need to capture telemetry from your services. Specifically,
captures metrics, distributed traces, resource metadata, and logs (logging support is incubating now) from your backend and client applications
sends this data to backends like Prometheus, Jaeger, Zipkin, and others for processing.
The OpenTelemetry specification describes the cross-language requirements and expectations for all OpenTelemetry implementations.
includes: API spec, SDK spec and Data spec (e.g. semantic convention)
https://github.com/open-telemetry/opentelemetry-specification
It’s not a component but rather governs all the components
The OpenTelemetry specification describes the cross-language requirements and expectations for all OpenTelemetry implementations.
includes: API spec, SDK spec and Data spec (e.g. semantic convention)
For traces, metrics and logs
to overcome the fragmentation
This comes to solve the pain of fragmentation that each vendor, each programming language and each signal has its own convention.
a unified data spec and semantic conventions will also enable correlation across signals and sources
https://github.com/open-telemetry/opentelemetry-specification
for instrumenting your app: one API and SDK per language (based on a unified specification)
One API and SDK per language, which include the interfaces and implementations that define and create distributed traces and metrics, manage sampling and context propagation, etc.
Language-specific integrations for popular web frameworks, storage clients, RPC libraries, etc. that (when enabled) automatically capture relevant traces and metrics and handle context propagation
Automatic instrumentation agents that can collect telemetry from some applications without requiring code changes
Language-specific exporters that allow SDKs to send captured traces and metrics to any supported backends
The OpenTelemetry Collector can collect data from OpenTelemetry SDKs and other sources, and then export this telemetry to any supported backend
Built like a data processing pipeline: receivers in multiple protocols, processing and aggregation, then exporters in multiple protocols
can collect telemetry from our app (backend/frontend) or from other infra components (k8s, docker, kafka, mysql, redis, httpd, aws xray, GCP pubsub, collectd …)
processing can do things like filter, modify, batch, sample, etc.
Exporters exist to aws xray, azure monitor, google, datadog, Dynatrace, splunk, sumologic, logz.io, Prometheus, jaeger, zipkin, kafka …
OTLP is a general-purpose telemetry data delivery protocol - between telemetry sources, intermediate nodes such as collectors and telemetry backends
OTLP defines the encoding of telemetry data and the protocol used to exchange data between the client and the server.
it’s a request/response style protocol for client-server communications.
includes the data model
OTLP is implemented over gRPC and HTTP 1.1 transports
Currently supports binary Protobuf encoding of the payload. later to add support for JSON encoding
OTLP provides wire-level compatibility for the binary Protobuf serialization
you can get the .proto files and can generate raw gRPC client libraries from them yourself
NOTE: OTEL collector is NOT limited to OTLP, as said, it has receivers and exported for many protocols.
Still, OTEL as a project strives to provide a unified protocol as part of the holistic framework, and to enable correlation across the telemetry OTLP specification describes the encoding, transport, and delivery mechanism of telemetry data between telemetry sources, intermediate nodes such as collectors and telemetry backends.
Opentelemetry is an aggregate of multiple groups, each working on a different component of this huge endeavor: different groups handle the specification for the different telemetry signals – tracing, logging and metrics, there are different groups focused on the different programming-language specific clients, to name a few. Each group has its own release cadence, which means that different components of OpenTelemetry may be in different stages of the maturity lifecycle:
Draft → Experimental → Stable → Deprecated.
Stable is the equivalent of GA (generally available), which is what you’d be seeking to run it in a production environment. stable is covered by long term support. Experimental is a Beta stage, which should enable testing the technology in evaluations and PoC towards integration.
When coming to evaluate OpenTelemetry for your project, you should map the status of the relevant components for your system:
The standard for the signal type of interest (traces/metrics/logs)
The protocol for the signal type of interest
The client library for the programming language(s) you use. Potentially also agents for instrumenting programming frameworks you use in your code.
Opentelemetry is an aggregate of multiple groups, each working on a different component of this huge endeavor: different groups handle the specification for the different telemetry signals – tracing, logging and metrics, there are different groups focused on the different programming-language specific clients, to name a few. Each group has its own release cadence, which means that different components of OpenTelemetry may be in different stages of the maturity lifecycle:
Draft → Experimental → Stable → Deprecated.
Stable is the equivalent of GA (generally available), which is what you’d be seeking to run it in a production environment. stable is covered by long term support (e.g. all instrumentation written against the tracing API will be compatible with future minor versions, and supported for a minimum of three years after the next major version of the OpenTelemetry API). Experimental is a Beta stage, which should enable testing the technology in evaluations and PoC towards integration.
When coming to evaluate OpenTelemetry for your project, you should map the status of the relevant components for your system:
The standard for the signal type of interest (traces/metrics/logs)
The protocol for the signal type of interest
The client library for the programming language(s) you use. Potentially also agents for instrumenting programming frameworks you use in your code.
OpenTelemetry clients are versioned to v1.0 once their tracing implementation is complete.
Metrics
The data model is stable and released as part of the OTLP protocol.
Experimental support for metric pipelines are available in the Collector.
Collector support for Prometheus is under developemnet, in collaboration with the Prometheus community.
The metric API and SDK specification is currently being prototyped in Java, .NET, and Python.
API: feature-freezeSDK: experimentalProtocol: stableCollector: experimental
Logging
The data model is experimental and released as part of the OTLP protocol.
Log processing for many data formats has been added to the Collector, thanks to the donation of Stanza to the the OpenTelemetry project.
Log appenders are currently under develop in many languages. Log appenders allow OpenTelemetry tracing data, such as trace and span IDs, to be appended to existing logging systems.
An OpenTelemetry logging SDK is currently under development. This allows OpenTelemetry clients to injest logging data from existing logging systems, outputting logs as part of OTLP along with tracing and metrics.
An OpenTelemetry logging API is not currently under development. We are focusing first on integration with existing logging systems. When metrics is complete, focus will shift to development of an OpenTelemetry logging API.
API: draftSDK: draftProtocol: experimentalCollector: experimental
Traces
Tracing API, SDK and Protocol specifications are stable, the Collector is stable
OpenTelemetry clients are versioned to v1.0 once their tracing implementation is complete.
Metrics
Protocol: Stable. The data model is stable and released as part of the OTLP protocol.
API: stable, SDK: feature-freeze
The metric API and SDK specification is currently being prototyped in Java, .NET, and Python.
Collector: experimental. Collector support for Prometheus is under development, in collaboration with the Prometheus community.
Experimental support for metric pipelines are available in the Collector.
Logging
Protocol: experimental. The data model is experimental and released as part of the OTLP protocol.
Collector: experimental. Log processing for many data formats has been added to the Collector, thanks to the donation of Stanza to the the OpenTelemetry project.
On API/SDK front: both still in draft stage.
focusing first on integration with existing logging systems.
Log appenders are currently under develop in many languages. Log appenders allow OpenTelemetry tracing data, such as trace and span IDs, to be appended to existing logging systems.
An OpenTelemetry logging SDK is currently under development. This allows OpenTelemetry clients to ingest logging data from existing logging systems, outputting logs as part of OTLP along with tracing and metrics.
An OpenTelemetry logging API is not currently under development. We are focusing first on integration with existing logging systems. When metrics is complete, focus will shift to development of an OpenTelemetry logging API.
When coming to evaluate OpenTelemetry for your project, you should map the status of the relevant components for your system:
The client library for the programming language(s) you use. Potentially also agents for instrumenting programming frameworks you use in your code. (we use nodejs with Happi/Express, or Java with Spring)
The signal type of interest (traces/metrics/logs)
The protocol for the signal type of interest (especially if brownfield)
The backend tool
Get involved in the open source
Feedback on the guide
Reach out to me @horovits
OpenTelemetry clients are versioned to v1.0 once their tracing implementation is complete.
Metrics
The data model is stable and released as part of the OTLP protocol.
Experimental support for metric pipelines are available in the Collector.
Collector support for Prometheus is under developemnet, in collaboration with the Prometheus community.
The metric API and SDK specification is currently being prototyped in Java, .NET, and Python.
API: feature-freezeSDK: experimentalProtocol: stableCollector: experimental
Logging
The data model is experimental and released as part of the OTLP protocol.
Log processing for many data formats has been added to the Collector, thanks to the donation of Stanza to the the OpenTelemetry project.
Log appenders are currently under develop in many languages. Log appenders allow OpenTelemetry tracing data, such as trace and span IDs, to be appended to existing logging systems.
An OpenTelemetry logging SDK is currently under development. This allows OpenTelemetry clients to injest logging data from existing logging systems, outputting logs as part of OTLP along with tracing and metrics.
An OpenTelemetry logging API is not currently under development. We are focusing first on integration with existing logging systems. When metrics is complete, focus will shift to development of an OpenTelemetry logging API.
API: draftSDK: draftProtocol: experimentalCollector: experimental
OpenTelemetry clients are versioned to v1.0 once their tracing implementation is complete.
Metrics
The data model is stable and released as part of the OTLP protocol.
Experimental support for metric pipelines are available in the Collector.
Collector support for Prometheus is under developemnet, in collaboration with the Prometheus community.
The metric API and SDK specification is currently being prototyped in Java, .NET, and Python.
API: feature-freezeSDK: experimentalProtocol: stableCollector: experimental
Logging
The data model is experimental and released as part of the OTLP protocol.
Log processing for many data formats has been added to the Collector, thanks to the donation of Stanza to the the OpenTelemetry project.
Log appenders are currently under develop in many languages. Log appenders allow OpenTelemetry tracing data, such as trace and span IDs, to be appended to existing logging systems.
An OpenTelemetry logging SDK is currently under development. This allows OpenTelemetry clients to injest logging data from existing logging systems, outputting logs as part of OTLP along with tracing and metrics.
An OpenTelemetry logging API is not currently under development. We are focusing first on integration with existing logging systems. When metrics is complete, focus will shift to development of an OpenTelemetry logging API.
API: draftSDK: draftProtocol: experimentalCollector: experimental
Traces
Tracing API, SDK and Protocol specifications are stable, the Collector is stable
OpenTelemetry clients are versioned to v1.0 once their tracing implementation is complete.
Metrics
Protocol: Stable. The data model is stable and released as part of the OTLP protocol.
API: feature-freeze, SDK: experimental
The metric API and SDK specification is currently being prototyped in Java, .NET, and Python.
Collector: experimental. Collector support for Prometheus is under development, in collaboration with the Prometheus community.
Experimental support for metric pipelines are available in the Collector.
Logging
Protocol: experimental. The data model is experimental and released as part of the OTLP protocol.
Collector: experimental. Log processing for many data formats has been added to the Collector, thanks to the donation of Stanza to the the OpenTelemetry project.
On API/SDK front: both still in draft stage.
focusing first on integration with existing logging systems.
Log appenders are currently under develop in many languages. Log appenders allow OpenTelemetry tracing data, such as trace and span IDs, to be appended to existing logging systems.
An OpenTelemetry logging SDK is currently under development. This allows OpenTelemetry clients to ingest logging data from existing logging systems, outputting logs as part of OTLP along with tracing and metrics.
An OpenTelemetry logging API is not currently under development. We are focusing first on integration with existing logging systems. When metrics is complete, focus will shift to development of an OpenTelemetry logging API.