SlideShare a Scribd company logo
1 of 70
Download to read offline
©2008–22 New Relic, Inc. All rights reserved
Daniel Kim, Lead Developer Relations Engineer @ New Relic
Kafka
Tracing
with
OpenTelemetry
©2008–22 New Relic, Inc. All rights reserved
Daniel Kim
Lead Developer Advocate, New Relic
- Likes spicy food (hotpot)
- Likes spicy talks
- Likes spicy tweets
- Follow me on @learnwdaniel
©2008–22 New Relic, Inc. All rights reserved
Agenda
- Why trace Kafka
- What is Distributed Tracing?
- What is OpenTelemetry?
- Implementing OpenTelemetry with Kafka
- Live Demo
©2008–22 New Relic, Inc. All rights reserved
I decided to do a live demo
©2008–22 New Relic, Inc. All rights reserved
Live Demo
Kafka Producer
Kafka Consumer
©2008–22 New Relic, Inc. All rights reserved
We need test data - so please ask questions!
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Trace
Why
Kafka?
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
©2008–22 New Relic, Inc. All rights reserved
NRDB
200 petabytes a month
= 9,000,000 records/sec
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
HTTP Requests
NRDB
(Durable Storage)
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Consumer
Microservice
Event 1
Event 2
Event 3 Consumer
Microservice
Consumer
Microservice
Partition
Topic
Broker
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Aggregate
Normalize
Decorate
Transform
NRDB
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Aggregate
Normalize
Decorate
Transform
NRDB
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Kafka retains messages for a
set amount of time so
consumers can potentially go
down or lag behind incoming
data and recover without
data loss
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Why do you need to trace Kafka?
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Agent
How do we optimize “time to glass”?
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Service
Service
Service
Service NRDB
1. Debug Bottlenecks in the ingest pipeline
©2008–22 New Relic, Inc. All rights reserved
Service
Service
[Canary]
Service
Service
2. Get visibility into Downstream impacts of your service
©2008–22 New Relic, Inc. All rights reserved
Service
Service]
Service
Service
3. Measure Data Loss
©2008–22 New Relic, Inc. All rights reserved
Service A
4. Fine Tune Kafka Config (Handling BackPressure)
Consumer
Producer
©2008–22 New Relic, Inc. All rights reserved
Service A
Consumer
Producer
4. Fine Tune Kafka Config (Handling BackPressure)
©2008–22 New Relic, Inc. All rights reserved
Producer
Batch 1
5. Fine Tune Kafka Config (Batching/Compressing Messages)
Message 1
Message 2
Message 3
Message 4
Message 5
Message 6
📦 batch.size = 6
Compression
©2008–22 New Relic, Inc. All rights reserved
Producer
Batch 2
5. Fine Tune Kafka Config (Batching/Compressing Messages)
Message 1
Message 2
Message 3
📦 batch.size = 6
⌛linger.ms = 20
📦 batch.size = 6
Compression
©2008–22 New Relic, Inc. All rights reserved
is
What
Distributed Tracing?
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
What is Distributed Tracing?
User
My Awesome
API
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
What is Distributed Tracing?
User
My Awesome
API
…. 13 SECONDS LATER
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
What is Distributed Tracing?
User
My Awesome API
…. 13 SECONDS LATER
A B
D C
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
How do we connect the dots?
Context
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Context in action
My Awesome API
A
B
We only see one side of the
transaction.
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Context in action
My Awesome API
A .Here A is sending a payload to
somewhere.
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Context in action
My Awesome API
B
Here B is receiving a payload
from somewhere
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Context in action
My Awesome API
A
B
Each Individual Span gets
its own id
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Context in action
My Awesome API
A
B
Each Individual Span gets
its own id (transaction)
And the whole trace gets its
own id (parentid)
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Context in action
My Awesome API
A
B
Each Individual Span gets
its own id (transaction)
And the whole trace gets its
own id (parentid)
Both data points get
encoded as a set of HTTP
headers
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Context in action
My Awesome API
A
B
transaction=12H4
parentid=x1Fd
Context
x1Fd
Overall Trace 12H4
And the parent span is x1FD
h4a5
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Context in action
My Awesome API
A
B
transaction=12H4
parentid=x1Fd
Received by A
x1Fd
Overall Trace 12H4
And the parent span is x1FD
h4a5
x1Fd
h4a5
12H4
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
A
B
What if we have microservices with
different languages or frameworks?
What if microservices are using
different observability systems?
What if we have decoupled
microservices via Kafka?
B
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
W3C standard
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
A C
X-NEWRELIC : mY1d3
B
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
A C
TraceParent: …
TraceState: NR=...
TraceParent: …
TraceState: NR=...
B
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
OSS Standard for Context Propagation
traceparent
tracestate
A
B
W3C Trace Context
enables cross-vendor
interoperation of traces
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Adoption of W3C Tracing standard
https://github.com/w3c/trace-context/blob/main/implementations.md
Vendors adopting W3C
OpenTelemetry
©2008–22 New Relic, Inc. All rights reserved
is
What
OpenTelemetry
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
What is OpenTelemetry?
+ =
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
What is OpenTelemetry?
My Awesome API
A B
D C
It’s an Open Source
Standard for
instrumentation
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
What is OpenTelemetry?
My Awesome API
A B
D C
Vendor Neutral
Export data into multiple data
backends for processing
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Distributed Tracing
Implementing
For Kafka
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
What happens with Kafka?
My Awesome API
A
B
Context is lost and
transaction is broken
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
What happens with Kafka?
Consumer
Producer
Kafka decouples producers and consumers
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
CONTEXT
Adding
to traces
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Otelsarama injects context into
Kafka messages, allowing you to
trace asynchronous messages
©2008–22 New Relic, Inc. All rights reserved
Kafka Headers
Message
HEADER
Add metadata about your record without
changing the content of your message
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Kafka Headers
Message
Let’s inject context (W3C Tracing Headers)
TraceState
TraceParent
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Kafka Headers
Producer
TraceState
Message
TraceParent
Producer needs to
Inject context into
Kafka Header
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Kafka Headers
TraceState
Message
TraceParent
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Kafka Headers
Consumer
TraceState
Message
TraceParent
ask.danielkim.fun
Context (Parent Span)
Timestamps
Attributes
…
©2008–22 New Relic, Inc. All rights reserved
Let’s print out what is injected in the header
kcat
https://github.com/edenhill/kcat
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Reading the trace headers
00-6626f84bd20a9a171a699a560387102b-10ba2f2a74c884bf-01
Tracing Standard Version
number (Base16)
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Reading the trace headers
00-6626f84bd20a9a171a699a560387102b-10ba2f2a74c884bf-01
Trace ID (Base16)
This is the ID of the whole trace and is used to uniquely identify
a distributed trace through a system.
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Reading the trace headers
00-6626f84bd20a9a171a699a560387102b-10ba2f2a74c884bf-01
Trace
Parent(Base16)
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Reading the trace headers
00-6626f84bd20a9a171a699a560387102b-10ba2f2a74c884bf-01
Tracing Flags
This flag indicates that the Span
has been sampled and will be
exported
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
OTel in Action
Consumer
Producer
Traceid
Spanid
Timestamps
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Live Demo
github.com/lazyplatypus/kafka-otel
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
©2008–22 New Relic, Inc. All rights reserved
What’s Happening?
message
Consumer
The consumer was busy
handling other requests so the
message was sitting in the kafka
queue
©2008–22 New Relic, Inc. All rights reserved
What’s Happening?
ConsumeClaim
time.Sleep
printMessage
MarkMessage
tweetMessage
©2008–22 New Relic, Inc. All rights reserved
WE
What have
Learned?
©2008–22 New Relic, Inc. All rights reserved
Caveat:
Implementing Distributed Tracing is NOT EASY
Especially at Scale, a lot of Cross Functional work to make sure
tracing is implemented in every microservice in your set up.
It is gonna be a lot of data, even when you sample every 10k, 50k,
100k traces.
©2008–22 New Relic, Inc. All rights reserved
Distributed Tracing is AWESOME!
- It gets you better visibility into the WHY on things are behaving
funky and helps you optimize your data ingest pipeline
- It helps you be a better neighbor
- Helps you tweak config variables to make your system more
performant.
- OSS Instrumentation/standards is the future of observability!
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Thank you!
twitter.com/learnwdaniel
ask.danielkim.fun

More Related Content

What's hot

What's hot (20)

Introduction to Open Telemetry as Observability Library
Introduction to Open  Telemetry as Observability LibraryIntroduction to Open  Telemetry as Observability Library
Introduction to Open Telemetry as Observability Library
 
Understand your system like never before with OpenTelemetry, Grafana, and Pro...
Understand your system like never before with OpenTelemetry, Grafana, and Pro...Understand your system like never before with OpenTelemetry, Grafana, and Pro...
Understand your system like never before with OpenTelemetry, Grafana, and Pro...
 
Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...
 
Observability in Java: Getting Started with OpenTelemetry
Observability in Java: Getting Started with OpenTelemetryObservability in Java: Getting Started with OpenTelemetry
Observability in Java: Getting Started with OpenTelemetry
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Red Hat OpenShift Container Platform Overview
Red Hat OpenShift Container Platform OverviewRed Hat OpenShift Container Platform Overview
Red Hat OpenShift Container Platform Overview
 
OpenTelemetry For Developers
OpenTelemetry For DevelopersOpenTelemetry For Developers
OpenTelemetry For Developers
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Meetup OpenTelemetry Intro
Meetup OpenTelemetry IntroMeetup OpenTelemetry Intro
Meetup OpenTelemetry Intro
 
Everything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingEverything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed Tracing
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
 
Improve monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss toolsImprove monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss tools
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Observability: Beyond the Three Pillars with Spring
Observability: Beyond the Three Pillars with SpringObservability: Beyond the Three Pillars with Spring
Observability: Beyond the Three Pillars with Spring
 
Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and Grafana
 
Containers Anywhere with OpenShift by Red Hat
Containers Anywhere with OpenShift by Red HatContainers Anywhere with OpenShift by Red Hat
Containers Anywhere with OpenShift by Red Hat
 
OpenShift 4, the smarter Kubernetes platform
OpenShift 4, the smarter Kubernetes platformOpenShift 4, the smarter Kubernetes platform
OpenShift 4, the smarter Kubernetes platform
 
Gathering Operational Intelligence in Complex Environments at Splunk
Gathering Operational Intelligence in Complex Environments at SplunkGathering Operational Intelligence in Complex Environments at Splunk
Gathering Operational Intelligence in Complex Environments at Splunk
 
Introduction to red hat agile integration (Red Hat Workshop)
Introduction to red hat agile integration (Red Hat Workshop)Introduction to red hat agile integration (Red Hat Workshop)
Introduction to red hat agile integration (Red Hat Workshop)
 
OpenShift 4 installation
OpenShift 4 installationOpenShift 4 installation
OpenShift 4 installation
 

Similar to Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summit London 2022

PostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical ReplicationPostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical Replication
Atsushi Torikoshi
 

Similar to Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summit London 2022 (20)

Building A System That Never Stops [FutureStack16 NYC]
Building  A System That Never Stops [FutureStack16 NYC]Building  A System That Never Stops [FutureStack16 NYC]
Building A System That Never Stops [FutureStack16 NYC]
 
Uncover the Root Cause of Kafka Performance Anomalies, Daniel Kim & Antón Rod...
Uncover the Root Cause of Kafka Performance Anomalies, Daniel Kim & Antón Rod...Uncover the Root Cause of Kafka Performance Anomalies, Daniel Kim & Antón Rod...
Uncover the Root Cause of Kafka Performance Anomalies, Daniel Kim & Antón Rod...
 
Best Practices for Measuring your Code Pipeline
Best Practices for Measuring your Code PipelineBest Practices for Measuring your Code Pipeline
Best Practices for Measuring your Code Pipeline
 
The Role of Standards in IoT Security
The Role of Standards in IoT SecurityThe Role of Standards in IoT Security
The Role of Standards in IoT Security
 
New Relic Infrastructure - New Integrations For Smarter and Faster Cloud Adop...
New Relic Infrastructure - New Integrations For Smarter and Faster Cloud Adop...New Relic Infrastructure - New Integrations For Smarter and Faster Cloud Adop...
New Relic Infrastructure - New Integrations For Smarter and Faster Cloud Adop...
 
PostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical ReplicationPostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical Replication
 
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
 
Media processing with serverless architecture
Media processing with serverless architectureMedia processing with serverless architecture
Media processing with serverless architecture
 
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
 
SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...
 
Building a System That Never Stops New Relic at Scale
Building a System That Never Stops New Relic at ScaleBuilding a System That Never Stops New Relic at Scale
Building a System That Never Stops New Relic at Scale
 
From shipping rpms to helm charts - Lessons learned and best practices
From shipping rpms to helm charts - Lessons learned and best practicesFrom shipping rpms to helm charts - Lessons learned and best practices
From shipping rpms to helm charts - Lessons learned and best practices
 
Understanding Microservice Latency for DevOps Teams: An Introduction to New R...
Understanding Microservice Latency for DevOps Teams: An Introduction to New R...Understanding Microservice Latency for DevOps Teams: An Introduction to New R...
Understanding Microservice Latency for DevOps Teams: An Introduction to New R...
 
“Quantum” Performance Effects: beyond the Core
“Quantum” Performance Effects: beyond the Core“Quantum” Performance Effects: beyond the Core
“Quantum” Performance Effects: beyond the Core
 
From Copycat Codelets to an AI Market Internet Protocol
From Copycat Codelets to an AI Market Internet ProtocolFrom Copycat Codelets to an AI Market Internet Protocol
From Copycat Codelets to an AI Market Internet Protocol
 
Software Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFVSoftware Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFV
 
Overpowered Kubernetes: CI/CD for K8s on Enterprise IaaS
Overpowered Kubernetes: CI/CD for K8s on Enterprise IaaSOverpowered Kubernetes: CI/CD for K8s on Enterprise IaaS
Overpowered Kubernetes: CI/CD for K8s on Enterprise IaaS
 
The impact of IOT - exchange cala - 2015
The impact of IOT - exchange cala - 2015The impact of IOT - exchange cala - 2015
The impact of IOT - exchange cala - 2015
 
Monitoring the Dynamic Nature of Cloud Computing
Monitoring the Dynamic Nature of Cloud ComputingMonitoring the Dynamic Nature of Cloud Computing
Monitoring the Dynamic Nature of Cloud Computing
 
Hardening Your CI/CD Pipelines with GitOps and Continuous Security
Hardening Your CI/CD Pipelines with GitOps and Continuous SecurityHardening Your CI/CD Pipelines with GitOps and Continuous Security
Hardening Your CI/CD Pipelines with GitOps and Continuous Security
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 

Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summit London 2022

  • 1. ©2008–22 New Relic, Inc. All rights reserved Daniel Kim, Lead Developer Relations Engineer @ New Relic Kafka Tracing with OpenTelemetry
  • 2. ©2008–22 New Relic, Inc. All rights reserved Daniel Kim Lead Developer Advocate, New Relic - Likes spicy food (hotpot) - Likes spicy talks - Likes spicy tweets - Follow me on @learnwdaniel
  • 3. ©2008–22 New Relic, Inc. All rights reserved Agenda - Why trace Kafka - What is Distributed Tracing? - What is OpenTelemetry? - Implementing OpenTelemetry with Kafka - Live Demo
  • 4. ©2008–22 New Relic, Inc. All rights reserved I decided to do a live demo
  • 5. ©2008–22 New Relic, Inc. All rights reserved Live Demo Kafka Producer Kafka Consumer
  • 6. ©2008–22 New Relic, Inc. All rights reserved We need test data - so please ask questions! ask.danielkim.fun
  • 7. ©2008–22 New Relic, Inc. All rights reserved Trace Why Kafka? ask.danielkim.fun
  • 8. ©2008–22 New Relic, Inc. All rights reserved
  • 9. ©2008–22 New Relic, Inc. All rights reserved NRDB 200 petabytes a month = 9,000,000 records/sec ask.danielkim.fun
  • 10. ©2008–22 New Relic, Inc. All rights reserved HTTP Requests NRDB (Durable Storage) ask.danielkim.fun
  • 11. ©2008–22 New Relic, Inc. All rights reserved Consumer Microservice Event 1 Event 2 Event 3 Consumer Microservice Consumer Microservice Partition Topic Broker ask.danielkim.fun
  • 12. ©2008–22 New Relic, Inc. All rights reserved Aggregate Normalize Decorate Transform NRDB ask.danielkim.fun
  • 13. ©2008–22 New Relic, Inc. All rights reserved Aggregate Normalize Decorate Transform NRDB ask.danielkim.fun
  • 14. ©2008–22 New Relic, Inc. All rights reserved Kafka retains messages for a set amount of time so consumers can potentially go down or lag behind incoming data and recover without data loss ask.danielkim.fun
  • 15. ©2008–22 New Relic, Inc. All rights reserved Why do you need to trace Kafka? ask.danielkim.fun
  • 16. ©2008–22 New Relic, Inc. All rights reserved Agent How do we optimize “time to glass”? ask.danielkim.fun
  • 17. ©2008–22 New Relic, Inc. All rights reserved Service Service Service Service NRDB 1. Debug Bottlenecks in the ingest pipeline
  • 18. ©2008–22 New Relic, Inc. All rights reserved Service Service [Canary] Service Service 2. Get visibility into Downstream impacts of your service
  • 19. ©2008–22 New Relic, Inc. All rights reserved Service Service] Service Service 3. Measure Data Loss
  • 20. ©2008–22 New Relic, Inc. All rights reserved Service A 4. Fine Tune Kafka Config (Handling BackPressure) Consumer Producer
  • 21. ©2008–22 New Relic, Inc. All rights reserved Service A Consumer Producer 4. Fine Tune Kafka Config (Handling BackPressure)
  • 22. ©2008–22 New Relic, Inc. All rights reserved Producer Batch 1 5. Fine Tune Kafka Config (Batching/Compressing Messages) Message 1 Message 2 Message 3 Message 4 Message 5 Message 6 📦 batch.size = 6 Compression
  • 23. ©2008–22 New Relic, Inc. All rights reserved Producer Batch 2 5. Fine Tune Kafka Config (Batching/Compressing Messages) Message 1 Message 2 Message 3 📦 batch.size = 6 ⌛linger.ms = 20 📦 batch.size = 6 Compression
  • 24. ©2008–22 New Relic, Inc. All rights reserved is What Distributed Tracing? ask.danielkim.fun
  • 25. ©2008–22 New Relic, Inc. All rights reserved What is Distributed Tracing? User My Awesome API ask.danielkim.fun
  • 26. ©2008–22 New Relic, Inc. All rights reserved What is Distributed Tracing? User My Awesome API …. 13 SECONDS LATER ask.danielkim.fun
  • 27. ©2008–22 New Relic, Inc. All rights reserved What is Distributed Tracing? User My Awesome API …. 13 SECONDS LATER A B D C ask.danielkim.fun
  • 28. ©2008–22 New Relic, Inc. All rights reserved How do we connect the dots? Context ask.danielkim.fun
  • 29. ©2008–22 New Relic, Inc. All rights reserved Context in action My Awesome API A B We only see one side of the transaction. ask.danielkim.fun
  • 30. ©2008–22 New Relic, Inc. All rights reserved Context in action My Awesome API A .Here A is sending a payload to somewhere. ask.danielkim.fun
  • 31. ©2008–22 New Relic, Inc. All rights reserved Context in action My Awesome API B Here B is receiving a payload from somewhere ask.danielkim.fun
  • 32. ©2008–22 New Relic, Inc. All rights reserved Context in action My Awesome API A B Each Individual Span gets its own id ask.danielkim.fun
  • 33. ©2008–22 New Relic, Inc. All rights reserved Context in action My Awesome API A B Each Individual Span gets its own id (transaction) And the whole trace gets its own id (parentid) ask.danielkim.fun
  • 34. ©2008–22 New Relic, Inc. All rights reserved Context in action My Awesome API A B Each Individual Span gets its own id (transaction) And the whole trace gets its own id (parentid) Both data points get encoded as a set of HTTP headers ask.danielkim.fun
  • 35. ©2008–22 New Relic, Inc. All rights reserved Context in action My Awesome API A B transaction=12H4 parentid=x1Fd Context x1Fd Overall Trace 12H4 And the parent span is x1FD h4a5 ask.danielkim.fun
  • 36. ©2008–22 New Relic, Inc. All rights reserved Context in action My Awesome API A B transaction=12H4 parentid=x1Fd Received by A x1Fd Overall Trace 12H4 And the parent span is x1FD h4a5 x1Fd h4a5 12H4 ask.danielkim.fun
  • 37. ©2008–22 New Relic, Inc. All rights reserved A B What if we have microservices with different languages or frameworks? What if microservices are using different observability systems? What if we have decoupled microservices via Kafka? B ask.danielkim.fun
  • 38. ©2008–22 New Relic, Inc. All rights reserved W3C standard ask.danielkim.fun
  • 39. ©2008–22 New Relic, Inc. All rights reserved A C X-NEWRELIC : mY1d3 B ask.danielkim.fun
  • 40. ©2008–22 New Relic, Inc. All rights reserved A C TraceParent: … TraceState: NR=... TraceParent: … TraceState: NR=... B ask.danielkim.fun
  • 41. ©2008–22 New Relic, Inc. All rights reserved OSS Standard for Context Propagation traceparent tracestate A B W3C Trace Context enables cross-vendor interoperation of traces ask.danielkim.fun
  • 42. ©2008–22 New Relic, Inc. All rights reserved Adoption of W3C Tracing standard https://github.com/w3c/trace-context/blob/main/implementations.md Vendors adopting W3C OpenTelemetry
  • 43. ©2008–22 New Relic, Inc. All rights reserved is What OpenTelemetry ask.danielkim.fun
  • 44. ©2008–22 New Relic, Inc. All rights reserved What is OpenTelemetry? + = ask.danielkim.fun
  • 45. ©2008–22 New Relic, Inc. All rights reserved What is OpenTelemetry? My Awesome API A B D C It’s an Open Source Standard for instrumentation ask.danielkim.fun
  • 46. ©2008–22 New Relic, Inc. All rights reserved What is OpenTelemetry? My Awesome API A B D C Vendor Neutral Export data into multiple data backends for processing ask.danielkim.fun
  • 47. ©2008–22 New Relic, Inc. All rights reserved Distributed Tracing Implementing For Kafka ask.danielkim.fun
  • 48. ©2008–22 New Relic, Inc. All rights reserved What happens with Kafka? My Awesome API A B Context is lost and transaction is broken ask.danielkim.fun
  • 49. ©2008–22 New Relic, Inc. All rights reserved What happens with Kafka? Consumer Producer Kafka decouples producers and consumers ask.danielkim.fun
  • 50. ©2008–22 New Relic, Inc. All rights reserved CONTEXT Adding to traces ask.danielkim.fun
  • 51. ©2008–22 New Relic, Inc. All rights reserved Otelsarama injects context into Kafka messages, allowing you to trace asynchronous messages
  • 52. ©2008–22 New Relic, Inc. All rights reserved Kafka Headers Message HEADER Add metadata about your record without changing the content of your message ask.danielkim.fun
  • 53. ©2008–22 New Relic, Inc. All rights reserved Kafka Headers Message Let’s inject context (W3C Tracing Headers) TraceState TraceParent ask.danielkim.fun
  • 54. ©2008–22 New Relic, Inc. All rights reserved Kafka Headers Producer TraceState Message TraceParent Producer needs to Inject context into Kafka Header ask.danielkim.fun
  • 55. ©2008–22 New Relic, Inc. All rights reserved Kafka Headers TraceState Message TraceParent ask.danielkim.fun
  • 56. ©2008–22 New Relic, Inc. All rights reserved Kafka Headers Consumer TraceState Message TraceParent ask.danielkim.fun Context (Parent Span) Timestamps Attributes …
  • 57. ©2008–22 New Relic, Inc. All rights reserved Let’s print out what is injected in the header kcat https://github.com/edenhill/kcat ask.danielkim.fun
  • 58. ©2008–22 New Relic, Inc. All rights reserved Reading the trace headers 00-6626f84bd20a9a171a699a560387102b-10ba2f2a74c884bf-01 Tracing Standard Version number (Base16) ask.danielkim.fun
  • 59. ©2008–22 New Relic, Inc. All rights reserved Reading the trace headers 00-6626f84bd20a9a171a699a560387102b-10ba2f2a74c884bf-01 Trace ID (Base16) This is the ID of the whole trace and is used to uniquely identify a distributed trace through a system. ask.danielkim.fun
  • 60. ©2008–22 New Relic, Inc. All rights reserved Reading the trace headers 00-6626f84bd20a9a171a699a560387102b-10ba2f2a74c884bf-01 Trace Parent(Base16) ask.danielkim.fun
  • 61. ©2008–22 New Relic, Inc. All rights reserved Reading the trace headers 00-6626f84bd20a9a171a699a560387102b-10ba2f2a74c884bf-01 Tracing Flags This flag indicates that the Span has been sampled and will be exported ask.danielkim.fun
  • 62. ©2008–22 New Relic, Inc. All rights reserved OTel in Action Consumer Producer Traceid Spanid Timestamps ask.danielkim.fun
  • 63. ©2008–22 New Relic, Inc. All rights reserved Live Demo github.com/lazyplatypus/kafka-otel ask.danielkim.fun
  • 64. ©2008–22 New Relic, Inc. All rights reserved
  • 65. ©2008–22 New Relic, Inc. All rights reserved What’s Happening? message Consumer The consumer was busy handling other requests so the message was sitting in the kafka queue
  • 66. ©2008–22 New Relic, Inc. All rights reserved What’s Happening? ConsumeClaim time.Sleep printMessage MarkMessage tweetMessage
  • 67. ©2008–22 New Relic, Inc. All rights reserved WE What have Learned?
  • 68. ©2008–22 New Relic, Inc. All rights reserved Caveat: Implementing Distributed Tracing is NOT EASY Especially at Scale, a lot of Cross Functional work to make sure tracing is implemented in every microservice in your set up. It is gonna be a lot of data, even when you sample every 10k, 50k, 100k traces.
  • 69. ©2008–22 New Relic, Inc. All rights reserved Distributed Tracing is AWESOME! - It gets you better visibility into the WHY on things are behaving funky and helps you optimize your data ingest pipeline - It helps you be a better neighbor - Helps you tweak config variables to make your system more performant. - OSS Instrumentation/standards is the future of observability! ask.danielkim.fun
  • 70. ©2008–22 New Relic, Inc. All rights reserved Thank you! twitter.com/learnwdaniel ask.danielkim.fun