Eventing and streaming open a world of compelling new possibilities to our software and platform designs. They can reduce time to decision and action while lowering total platform cost. But they are not a panacea. Understanding the edges and limits of these architectures can help you avoid painful missteps. This talk will focus on event-driven and streaming architectures and how Apache Kafka can help you implement these. It will also discuss key tradeoffs you will face along the way from partitioning schemes to the impact of availability vs. consistency (CAP Theorem). Finally, we’ll discuss some challenges of scale for patterns like Event Sourcing and how you can use other tools and even features of Kafka to work around them. This talk assumes a basic understanding of Kafka and distributed computing but will include brief refresher sections.
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summit NYC 2019
1. Hard Truths About
Eventing and Streaming
Dan Rosanova
Group Principle Program Manager
Microsoft Azure Messaging
2. A brief history of messaging
• Old school messaging (System/360 QTAM and TCAM)
• IBM MQ
• Rabbit MQ
• Service Bus
• ActiveMQ™
• ZeroMQ
• Apache Kafka®
• NATS
6. The queue is the arbiter of truth – which
simplifies many other aspects
EACH READER CAN JUST SAY ‘GIVE
ME THE NEXT’
MESSAGES ARE ACKED /
COMPLETED INDIVIDUALLY
THE QUEUE IS A BUFFER TO
IMPROVE SCALE AND
PERFORMANCE
20. There’s
something
else that
resembles
this
RECORDS A STREAM RECODING MOVES
FORWARD ONLY
YOU CAN PLAY THE
TAPE OVER AND OVER
AGAIN
A CASSETTE TAPE
ACTUALLY HAS LEFT
AND RIGHT CHANNELS
WHEN YOU PRESS
RECORD, THEY BOTH
RECORD
BUT THE DATA ON
EACH CHANNEL IS
DIFFERENT
IN KAFKA THESE
CHANNELS ARE
CALLED PARTITIONS
21. A bit more on
the partition
concept
Partition is essentially
append only
Reads are performed
using a client side curor
Reads are
nondestructive
22. In a stream the partition is the
Unit of Work
Streams are processed differently from batch data – normal functions cannot
operate on streams as a whole, as they have potentially unlimited data, and
formally, streams are codata (potentially unlimited), not data (which is finite).
27. Low cost
• There are no expensive indexes to maintain
• Because each partition is independent there is
no cross broker coordination necessary (other
than optional replication)
• Client-side cursor avoids the overhead of
traditional message brokers
• Data replication and ACK level is a choice of the
sender
31. Fan out and routing
• Partitioned streams (like Kafka) don’t offer
server-side filtering
• Every reader must read all the data
• As more readers want the data a network
imbalance develops
• Parse.ly Kafkapocalypse
10MBps
10MBps
10MBps
10MBps
N MBps
32. Streams are not queues
• The Unit of Work is not an individual message
• This means processing individual messages
gets complicated
• Cursor management becomes a big challenge
• There is no inherent dead letter capability
• People start adding these ‘features’ in and end
up recreating a queue
33. CAP Theorem
In theoretical computer science the CAP theorem states that
it is impossible for a distributed computer system to
simultaneously provide all three of the following guarantees:
Consistency, Availability, Partition tolerance
34. What does CAP mean for streams?
Consistency: Data should
produce the same results
when read multiple times –
i.e. it should be stable and
durable
Availability: The place data is
written to should always be
available to write to
Partition tolerance: the
ability to continue
functioning when one part of
the system becomes
separated from another
35. Or put another way
when a network partition happens, which over
time is inevitable, then you must make a
choice...
36. This is your last chance.
After this, there is no turning back...
Consistency
37. You must decide which of these two is most
significant
Consistency Availabilty
38. Partitioning schemes
Not all keys are created
equal
You need to be careful to
avoid hot keys
It’s not always something
you can avoid
40. Adding partitions
You’ve identified a hot
partition
You add more partitions
to handle the scale
The result is a data split
Partition 1
Partition 2
1 2 3 4
5 6 7
43. Strategies for dealing with failures in
messaging and streaming
Stop Drop
Retry Deadletter
44. Stop
• Simply stop reading – or writing the stream
• Wait until someone elsewhere has fixed the
problem and then resume
• Appropriate for some scenarios, but not all
• Probably a good idea to include a notification
45. Drop
• If the messages aren’t that important, just drop
them
• Up to a certain point they may not matter
• This is a good strategy for non-mission critical
streams
• But not so good for scenarios requiring strong
consistency guarantees
• Definitely a good idea to include a notification
46. Retry
• Try again and see if it works
• Perhaps the error is transient
• Be aware of impact on downstream systems -
idempotence
47. Deadletter
• Put the data somewhere off your hot path so
that you can go back and handle it later
• Does not interrupt your flow
• Works for poisoned messages
48. Combining strategies
• Often no one strategy will exactly match your
needs
• You can combine these to achieve the policy
that is right for you
• E.G. Retry three times, then deadletter
50. What are event driven architectures
• Events are notifications that something
happened
• This is different than traditional messages,
which are the thing (the command)
• Event Driven Architectures are reactive in
nature
• State is derived from an event log or stream
51. Event Sourcing
• Add head
• Add body
• Add left arm
• Add right arm
• Add left leg
• Add right leg
52. Event Sourcing
• Add head
• Add body
• Add left arm
• Add right arm
• Add left leg
• Add right leg
53. Capabilities we’ve gain from Event Sourcing
• Complete rebuild
• Temporal query
• Event replay
54. What cool things can you do now?
• Add head
• Add body
• Add left arm
• Add right arm
• Add left leg
• Add right leg
56. Obvious shortcomings of Event Sourcing and
how to overcome them
TIME TO PROCESS THE LOG:
CHECKPOINTING ON A REGULAR BASIS
HOW TO QUERY THE STATE: BUILDING
A MATERIALIZED VIEW
57. Event Sourcing leads to divergent models for
read and write
This is often addressed with Command Query Responsibility Separation (CQRS)
Despite these benefits, you should be very cautious about using CQRS. Many
information systems fit well with the notion of an information base that is updated
in the same way that it's read, adding CQRS to such a system can add significant
complexity. I've certainly seen cases where it's made a significant drag on
productivity, adding an unwarranted amount of risk to the project, even in the
hands of a capable team.
-Martin Fowler
58. KStreams can help you do Event Sourcing
BASICALLY A WAY TO DO EVENT
SOURCING WITHOUT BEING AN
ARCHITECTURAL ASTRONAUT
PROVIDES MATERIALIZED VIEW
(USES ROCKSDB INTERNALLY TO
HOLD THE TABLE)
EACH APPLICATION CAN NOW
HAVE ITS OWN VIEW OF THE
STREAM
60. A specification for describing event
data in common formats to provide
interoperability across services,
platforms and systems.
61. Why Cloud Events?
THE LACK OF A COMMON WAY OF
DESCRIBING EVENTS MEANS
DEVELOPERS MUST CONSTANTLY RE-
LEARN HOW TO RECEIVE EVENTS.
THIS ALSO LIMITS THE POTENTIAL
FOR LIBRARIES, TOOLING AND
INFRASTRUCTURE TO AIDE THE
DELIVERY OF EVENT DATA ACROSS
ENVIRONMENTS.
THE PORTABILITY AND
PRODUCTIVITY WE CAN ACHIEVE
FROM EVENT DATA IS HINDERED
OVERALL.
CONSISTENCY ACCESSIBILITY PORTABILITY
62. Sample Cloud Event
• These are the rules for
the envelope
• The data section is
opaque
63. Combining Events and Streams
•Events can be fed into a stream
•Stream processors can produce their own events
Stream f(x)
64. Key differences between events and streams
Events as the records, streams as the
communication mechanism
65. Key differences between events and streams
• Dispatch and how you can do this in Kafka
• Push and other ways to accomplish it
Stream Push Based
Dispatch
Fan In Fan Out
66. In closing
Pick the right tool for
the job
You may need
multiple tools
Be realistic about
your expectations
Experiment and learn
- continuously
Share your learnings
in contributions,
blogs, etc.
Be an active member
of the Apache Kafka
community!