More Related Content Similar to Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summit London 2022 (20) More from HostedbyConfluent (20) Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summit London 20221. ©2008–22 New Relic, Inc. All rights reserved
Daniel Kim, Lead Developer Relations Engineer @ New Relic
Kafka
Tracing
with
OpenTelemetry
2. ©2008–22 New Relic, Inc. All rights reserved
Daniel Kim
Lead Developer Advocate, New Relic
- Likes spicy food (hotpot)
- Likes spicy talks
- Likes spicy tweets
- Follow me on @learnwdaniel
3. ©2008–22 New Relic, Inc. All rights reserved
Agenda
- Why trace Kafka
- What is Distributed Tracing?
- What is OpenTelemetry?
- Implementing OpenTelemetry with Kafka
- Live Demo
6. ©2008–22 New Relic, Inc. All rights reserved
We need test data - so please ask questions!
ask.danielkim.fun
9. ©2008–22 New Relic, Inc. All rights reserved
NRDB
200 petabytes a month
= 9,000,000 records/sec
ask.danielkim.fun
10. ©2008–22 New Relic, Inc. All rights reserved
HTTP Requests
NRDB
(Durable Storage)
ask.danielkim.fun
11. ©2008–22 New Relic, Inc. All rights reserved
Consumer
Microservice
Event 1
Event 2
Event 3 Consumer
Microservice
Consumer
Microservice
Partition
Topic
Broker
ask.danielkim.fun
12. ©2008–22 New Relic, Inc. All rights reserved
Aggregate
Normalize
Decorate
Transform
NRDB
ask.danielkim.fun
13. ©2008–22 New Relic, Inc. All rights reserved
Aggregate
Normalize
Decorate
Transform
NRDB
ask.danielkim.fun
14. ©2008–22 New Relic, Inc. All rights reserved
Kafka retains messages for a
set amount of time so
consumers can potentially go
down or lag behind incoming
data and recover without
data loss
ask.danielkim.fun
15. ©2008–22 New Relic, Inc. All rights reserved
Why do you need to trace Kafka?
ask.danielkim.fun
16. ©2008–22 New Relic, Inc. All rights reserved
Agent
How do we optimize “time to glass”?
ask.danielkim.fun
17. ©2008–22 New Relic, Inc. All rights reserved
Service
Service
Service
Service NRDB
1. Debug Bottlenecks in the ingest pipeline
18. ©2008–22 New Relic, Inc. All rights reserved
Service
Service
[Canary]
Service
Service
2. Get visibility into Downstream impacts of your service
19. ©2008–22 New Relic, Inc. All rights reserved
Service
Service]
Service
Service
3. Measure Data Loss
20. ©2008–22 New Relic, Inc. All rights reserved
Service A
4. Fine Tune Kafka Config (Handling BackPressure)
Consumer
Producer
21. ©2008–22 New Relic, Inc. All rights reserved
Service A
Consumer
Producer
4. Fine Tune Kafka Config (Handling BackPressure)
22. ©2008–22 New Relic, Inc. All rights reserved
Producer
Batch 1
5. Fine Tune Kafka Config (Batching/Compressing Messages)
Message 1
Message 2
Message 3
Message 4
Message 5
Message 6
📦 batch.size = 6
Compression
23. ©2008–22 New Relic, Inc. All rights reserved
Producer
Batch 2
5. Fine Tune Kafka Config (Batching/Compressing Messages)
Message 1
Message 2
Message 3
📦 batch.size = 6
⌛linger.ms = 20
📦 batch.size = 6
Compression
24. ©2008–22 New Relic, Inc. All rights reserved
is
What
Distributed Tracing?
ask.danielkim.fun
25. ©2008–22 New Relic, Inc. All rights reserved
What is Distributed Tracing?
User
My Awesome
API
ask.danielkim.fun
26. ©2008–22 New Relic, Inc. All rights reserved
What is Distributed Tracing?
User
My Awesome
API
…. 13 SECONDS LATER
ask.danielkim.fun
27. ©2008–22 New Relic, Inc. All rights reserved
What is Distributed Tracing?
User
My Awesome API
…. 13 SECONDS LATER
A B
D C
ask.danielkim.fun
28. ©2008–22 New Relic, Inc. All rights reserved
How do we connect the dots?
Context
ask.danielkim.fun
29. ©2008–22 New Relic, Inc. All rights reserved
Context in action
My Awesome API
A
B
We only see one side of the
transaction.
ask.danielkim.fun
30. ©2008–22 New Relic, Inc. All rights reserved
Context in action
My Awesome API
A .Here A is sending a payload to
somewhere.
ask.danielkim.fun
31. ©2008–22 New Relic, Inc. All rights reserved
Context in action
My Awesome API
B
Here B is receiving a payload
from somewhere
ask.danielkim.fun
32. ©2008–22 New Relic, Inc. All rights reserved
Context in action
My Awesome API
A
B
Each Individual Span gets
its own id
ask.danielkim.fun
33. ©2008–22 New Relic, Inc. All rights reserved
Context in action
My Awesome API
A
B
Each Individual Span gets
its own id (transaction)
And the whole trace gets its
own id (parentid)
ask.danielkim.fun
34. ©2008–22 New Relic, Inc. All rights reserved
Context in action
My Awesome API
A
B
Each Individual Span gets
its own id (transaction)
And the whole trace gets its
own id (parentid)
Both data points get
encoded as a set of HTTP
headers
ask.danielkim.fun
35. ©2008–22 New Relic, Inc. All rights reserved
Context in action
My Awesome API
A
B
transaction=12H4
parentid=x1Fd
Context
x1Fd
Overall Trace 12H4
And the parent span is x1FD
h4a5
ask.danielkim.fun
36. ©2008–22 New Relic, Inc. All rights reserved
Context in action
My Awesome API
A
B
transaction=12H4
parentid=x1Fd
Received by A
x1Fd
Overall Trace 12H4
And the parent span is x1FD
h4a5
x1Fd
h4a5
12H4
ask.danielkim.fun
37. ©2008–22 New Relic, Inc. All rights reserved
A
B
What if we have microservices with
different languages or frameworks?
What if microservices are using
different observability systems?
What if we have decoupled
microservices via Kafka?
B
ask.danielkim.fun
40. ©2008–22 New Relic, Inc. All rights reserved
A C
TraceParent: …
TraceState: NR=...
TraceParent: …
TraceState: NR=...
B
ask.danielkim.fun
41. ©2008–22 New Relic, Inc. All rights reserved
OSS Standard for Context Propagation
traceparent
tracestate
A
B
W3C Trace Context
enables cross-vendor
interoperation of traces
ask.danielkim.fun
42. ©2008–22 New Relic, Inc. All rights reserved
Adoption of W3C Tracing standard
https://github.com/w3c/trace-context/blob/main/implementations.md
Vendors adopting W3C
OpenTelemetry
45. ©2008–22 New Relic, Inc. All rights reserved
What is OpenTelemetry?
My Awesome API
A B
D C
It’s an Open Source
Standard for
instrumentation
ask.danielkim.fun
46. ©2008–22 New Relic, Inc. All rights reserved
What is OpenTelemetry?
My Awesome API
A B
D C
Vendor Neutral
Export data into multiple data
backends for processing
ask.danielkim.fun
47. ©2008–22 New Relic, Inc. All rights reserved
Distributed Tracing
Implementing
For Kafka
ask.danielkim.fun
48. ©2008–22 New Relic, Inc. All rights reserved
What happens with Kafka?
My Awesome API
A
B
Context is lost and
transaction is broken
ask.danielkim.fun
49. ©2008–22 New Relic, Inc. All rights reserved
What happens with Kafka?
Consumer
Producer
Kafka decouples producers and consumers
ask.danielkim.fun
51. ©2008–22 New Relic, Inc. All rights reserved
Otelsarama injects context into
Kafka messages, allowing you to
trace asynchronous messages
52. ©2008–22 New Relic, Inc. All rights reserved
Kafka Headers
Message
HEADER
Add metadata about your record without
changing the content of your message
ask.danielkim.fun
53. ©2008–22 New Relic, Inc. All rights reserved
Kafka Headers
Message
Let’s inject context (W3C Tracing Headers)
TraceState
TraceParent
ask.danielkim.fun
54. ©2008–22 New Relic, Inc. All rights reserved
Kafka Headers
Producer
TraceState
Message
TraceParent
Producer needs to
Inject context into
Kafka Header
ask.danielkim.fun
55. ©2008–22 New Relic, Inc. All rights reserved
Kafka Headers
TraceState
Message
TraceParent
ask.danielkim.fun
56. ©2008–22 New Relic, Inc. All rights reserved
Kafka Headers
Consumer
TraceState
Message
TraceParent
ask.danielkim.fun
Context (Parent Span)
Timestamps
Attributes
…
57. ©2008–22 New Relic, Inc. All rights reserved
Let’s print out what is injected in the header
kcat
https://github.com/edenhill/kcat
ask.danielkim.fun
58. ©2008–22 New Relic, Inc. All rights reserved
Reading the trace headers
00-6626f84bd20a9a171a699a560387102b-10ba2f2a74c884bf-01
Tracing Standard Version
number (Base16)
ask.danielkim.fun
59. ©2008–22 New Relic, Inc. All rights reserved
Reading the trace headers
00-6626f84bd20a9a171a699a560387102b-10ba2f2a74c884bf-01
Trace ID (Base16)
This is the ID of the whole trace and is used to uniquely identify
a distributed trace through a system.
ask.danielkim.fun
60. ©2008–22 New Relic, Inc. All rights reserved
Reading the trace headers
00-6626f84bd20a9a171a699a560387102b-10ba2f2a74c884bf-01
Trace
Parent(Base16)
ask.danielkim.fun
61. ©2008–22 New Relic, Inc. All rights reserved
Reading the trace headers
00-6626f84bd20a9a171a699a560387102b-10ba2f2a74c884bf-01
Tracing Flags
This flag indicates that the Span
has been sampled and will be
exported
ask.danielkim.fun
62. ©2008–22 New Relic, Inc. All rights reserved
OTel in Action
Consumer
Producer
Traceid
Spanid
Timestamps
ask.danielkim.fun
63. ©2008–22 New Relic, Inc. All rights reserved
Live Demo
github.com/lazyplatypus/kafka-otel
ask.danielkim.fun
65. ©2008–22 New Relic, Inc. All rights reserved
What’s Happening?
message
Consumer
The consumer was busy
handling other requests so the
message was sitting in the kafka
queue
66. ©2008–22 New Relic, Inc. All rights reserved
What’s Happening?
ConsumeClaim
time.Sleep
printMessage
MarkMessage
tweetMessage
68. ©2008–22 New Relic, Inc. All rights reserved
Caveat:
Implementing Distributed Tracing is NOT EASY
Especially at Scale, a lot of Cross Functional work to make sure
tracing is implemented in every microservice in your set up.
It is gonna be a lot of data, even when you sample every 10k, 50k,
100k traces.
69. ©2008–22 New Relic, Inc. All rights reserved
Distributed Tracing is AWESOME!
- It gets you better visibility into the WHY on things are behaving
funky and helps you optimize your data ingest pipeline
- It helps you be a better neighbor
- Helps you tweak config variables to make your system more
performant.
- OSS Instrumentation/standards is the future of observability!
ask.danielkim.fun
70. ©2008–22 New Relic, Inc. All rights reserved
Thank you!
twitter.com/learnwdaniel
ask.danielkim.fun