SlideShare una empresa de Scribd logo
1
Stream Processing
fundamentals and
Introduction to ksqlDB
Vish Srinivasan
Systems Engineer
2
Agenda
● Motivation for Event Streaming ~5
● Kafka 101 ~10
● Stream Processing ~20
1. Kafka Streams
2. KSQL
● ksqlDB - An introduction ~10
● Q&A ~10
33
Event Streaming Motivation
44
The Central Challenge
Connecting applications with data
ETL
What happened
in the world
Messaging
What is happening in
the world
5
Motivation
6
7
STATEEVENT >
I changed my job from
Snaplogic to Confluent
in April 2019
I work at Confluent.
8
JOB CHANGE RECOMMENDATION ENGINE
SEARCH INDEX
EMAIL SERVICE
9
IS MORE
SOFTWARE
THE USER OF
THE SOFTWARE
1010
Customers Expect Rich
Digital Experiences
● Real-Time combined with historical data
● Only Event Streaming Platforms can do this
When will my driver
get here?
1111
Event-Driven App
(Location Tracking)
Only Real-Time Events
Messaging Queues and
Event Streaming
Platforms can do this
Contextual
Event-Driven App
(ETA)
Real-Time combined
with stored data
Only Event Streaming
Platforms can do this
Where is my driver? When will my driver
get here?
Where is my driver? When will my driver
get here?
2
min
Why Combine Real-time
With Historical Context?
VS.
12
Contextual, Event-Driven Apps
in the Enterprise
“We look at events as running our business. Business people within our
organization want to be able to react to events—and oftentimes it's a
combination of events.”
—Chris D’Agostino, VP of Streaming Data
01
Real-Time
Fraud Notifications
03
Automated
Transaction Analysis
02
Real-Time
“Second Look”
13
Take Away #1
Event Streaming Platforms let
you build Contextual Event
Driven Applications combining
real time and historical data.
14
An Event Streaming Platform
gives you three key functionalities
Publish & Subscribe
to Events
Store
Events
Process & Analyze
Events
1515
… But first, Kafka Basics
16
Kafka is a Foundation for Event Streams
0 1 2 3 4 5 6 7 8LOG
READS
WRITES
DESTINATION
SYSTEM A
DESTINATION
SYSTEM B
17
BROKER 1 BROKER 2 BROKER 3 BROKER 4
TOPIC 1-PART
1
Storage: Distributed and Replicated
TOPIC 2-PART
2
TOPIC 2-PART
1
TOPIC 1-PART
2
TOPIC 1-PART
1
TOPIC 2-PART
2
TOPIC 2-PART
1
TOPIC 1-PART
2
TOPIC 1-PART
1
TOPIC 2-PART
2
TOPIC 2-PART
1
TOPIC 1-PART
2
2 topics, 2 partitions each, 3 replicas each
PRODUCER
CONSUMER
18
Producing to Kafka
19
Messages will be produced
in a round robin fashion
Written to leader of a
partitions
Producing to Kafka
Time
1
2
3
4
5
20
A
B
C
D
hash(key) %
numPartitions = N
Producing to Kafka with a Key
Time
21
Consuming from Kafka
22
C
Consuming with Single Client
23
C
C
C
C
Consuming with Consumer Groups
Logical Name
Load balanced across all consumers in the group
24
C
CCC
CG1
CC
CG2
Consuming with Consumer Groups
25
Delivery Guarantees
● Producer Guarantees
○ Acks = 0
○ Acks = 1
○ Acks = all
● Consumer Guarantees
○ At least once
○ At most once
○ Exactly once
26
Take Away #2
Kafka lets you
publish/subscribe to events as
well as store events.
27
An Event Streaming Platform
gives you three key functionalities
Publish & Subscribe
to Events
Store
Events
Process & Analyze
Events
Stream Processing by Analogy
Kafka Cluster
Connect API Stream Processing Connect API
$ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt
29
Event Transformation with Stream Processing
streams
The streaming SQL engine for Apache Kafka®
CREATE STREAM fraudulent_payments AS
SELECT * FROM payments
WHERE fraudProbability > 0.8;
Apache Kafka® library to write
real-time applications and
microservices in Java and Scala
Confluent KSQL
You write only SQL. No Java, Python, or
other boilerplate to wrap around it!
30
subscribe(), poll(), send(),
flush(), beginTransaction(), …
KStream, KTable, filter(), map(),
flatMap(), join(), aggregate(),
transform(), …
CREATE STREAM, CREATE TABLE,
SELECT, JOIN, GROUP BY, SUM, …
Shoulders of Streaming Giants
KSQL UDFs
31
Processing Layer
(KSQL, KStreams)
31
00100 11101 11000 00011 00100 00110Topic
alice Paris bob Sydney alice RomeStream
plus schema (serdes)
alice Rome
bob Sydney
Table
plus aggregation
Storage Layer
(Brokers)
Topics vs. Streams and Tables
32
“The ledger of Vish’s sales.” “Vish’s sales totals.”
“California sales totals.”
Streams
record history
Tables
represent state
33
1. e4 e5
2. Nf3 Nc6
3. Bc4 Bc5
4. d3 Nf6
5. Nbd2
“The sequence of moves.” “The state of the board.”
Streams
record history
Tables
represent state
34
Another analogy: Behavioral psychology
35
● Processing is partitioned
● Unit of parallelism is stream-task
Streams
topic with schema
Tables
underlying topic (usually) compacted
● Materialized view, cannot be mutated
● Implemented on top of a state-store (mutable)
36
Take Away #3
2 tools to process data: Kafka
Streams and KSQL
2 concepts in both: Streams
and Tables.
3737
… Now, ksqlDB
38
KSQL for Real-Time Monitoring
● Log data monitoring
● Tracking and alerting
● Syslog data
● Sensor / IoT data
● Application metrics
CREATE STREAM syslog_invalid_users AS
SELECT host, message
FROM syslog
WHERE message LIKE '%Invalid user%';
http://cnfl.io/syslogs-filtering / http://cnfl.io/syslog-alerting
39
KSQL for Anomaly Detection
● Identify patterns or
anomalies in real-
time data, surfaced
in milliseconds
CREATE TABLE possible_fraud AS
SELECT card_number, COUNT(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING COUNT(*) > 3;
40
KSQL for Streaming ETL
● Joining, filtering, and
aggregating streams
of event data
CREATE STREAM vip_actions AS
SELECT user_id, page, action
FROM clickstream c
LEFT JOIN users u
ON c.user_id = u.user_id
WHERE u.level = 'Platinum';
41
KSQL is a stream processing technology
As such it is not yet a great fit for:
Ad-hoc queries
● No indexes yet in KSQL
● Kafka often configured to retain
data for only a limited span of
time
BI reports (Tableau etc.)
● No indexes yet in KSQL
● No JDBC
● Most BI tools don’t understand
continuous, streaming results
42
PUSH PULL
APP
Jay’s credit score is
670
Jay’s credit score is
710
Jay’s credit score is
695
What is Jay’s credit score now?
695
APP
43
PUSH PULL
SELECT user, credit_score
FROM credit_history
WHERE ROWKEY = ‘jay’
EMIT CHANGES;
SELECT user, credit_score
FROM credit_history
WHERE ROWKEY = ‘jay’;
44
ksqlDB adds two key features to augment KSQL
PULL QUERIES
● Point-in-time lookup of information
● Comparable to a SELECT
statement in a relational database
EMBEDDED CONNECTORS
● Move event data to and from
external data systems
● Available for all supported
connectors
21
APPPULL
$25
How much does Jay’s ride
cost?
CONNECTOR
CONNECTOR
ksqlDB
CONNECTOR
46
So, What use cases is ksqlDB a good fit for?
It does not replace traditional databases:
● What is a database?
● Materialize events into an opinionated structure (table) so you get power of SQL
● When we query, We are querying the state produced by the processor executing the
commit log - we just recreated materialized views.
47
So, What use cases is ksqlDB a good fit for?
ksqlDB is primarily useful for three broad categories of applications:
● Building and serving materialized views that power apps
● Creating real-time streaming apps that react to event streams and trigger side effects
● Creating real-time streaming pipelines that continuously transform event streams
48
Summary Takeaways
● Event Streaming Platforms let you build Contextual Event Driven
Applications combining real time and historical data.
● Kafka lets you publish/subscribe to events and also store them
● Process data with Kafka Streams or KSQL using Streams and Tables
● ksqlDB makes it easy to build and serve materialized views that power apps
49
Thank You!
Reach out if you have any questions:
● Vish Srinivasan - vish@confluent.io
Community Slack: https://launchpass.com/confluentcommunity
Learn Kafka - https://kafka-tutorials.confluent.io/
ksqlDB - https://ksqldb.io/

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

kafka
kafkakafka
kafka
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)
 
Benefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use CasesBenefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use Cases
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
An Introduction to Confluent Cloud: Apache Kafka as a Service
An Introduction to Confluent Cloud: Apache Kafka as a ServiceAn Introduction to Confluent Cloud: Apache Kafka as a Service
An Introduction to Confluent Cloud: Apache Kafka as a Service
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache Airflow
 
카프카, 산전수전 노하우
카프카, 산전수전 노하우카프카, 산전수전 노하우
카프카, 산전수전 노하우
 
KSQL Intro
KSQL IntroKSQL Intro
KSQL Intro
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connect
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka Streams
 

Similar a Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)

Evolution of Real-time User Engagement Event Consumption at Pinterest
Evolution of Real-time User Engagement Event Consumption at PinterestEvolution of Real-time User Engagement Event Consumption at Pinterest
Evolution of Real-time User Engagement Event Consumption at Pinterest
HostedbyConfluent
 
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Michael Noll
 

Similar a Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent) (20)

Concepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with KafkaConcepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with Kafka
 
APAC ksqlDB Workshop
APAC ksqlDB WorkshopAPAC ksqlDB Workshop
APAC ksqlDB Workshop
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®
 
All Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZAll Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZ
 
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud ServicesBuild a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
 
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
 
BigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexBigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache Apex
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Evolution of Real-time User Engagement Event Consumption at Pinterest
Evolution of Real-time User Engagement Event Consumption at PinterestEvolution of Real-time User Engagement Event Consumption at Pinterest
Evolution of Real-time User Engagement Event Consumption at Pinterest
 
Story of migrating event pipeline from batch to streaming
Story of migrating event pipeline from batch to streamingStory of migrating event pipeline from batch to streaming
Story of migrating event pipeline from batch to streaming
 
Day in the life event-driven workshop
Day in the life  event-driven workshopDay in the life  event-driven workshop
Day in the life event-driven workshop
 
Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Kafka Streams vs. KSQL for Stream Processing on top of Apache KafkaKafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
 
Encode Club workshop slides
Encode Club workshop slidesEncode Club workshop slides
Encode Club workshop slides
 
Bridging the Gap: Connecting AWS and Kafka
Bridging the Gap: Connecting AWS and KafkaBridging the Gap: Connecting AWS and Kafka
Bridging the Gap: Connecting AWS and Kafka
 
Apache Pulsar Overview
Apache Pulsar OverviewApache Pulsar Overview
Apache Pulsar Overview
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
 
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
 
Citi Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and PerformanceCiti Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and Performance
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
 

Más de KafkaZone

Más de KafkaZone (7)

Real time data processing and model inferncing platform with Kafka streams (N...
Real time data processing and model inferncing platform with Kafka streams (N...Real time data processing and model inferncing platform with Kafka streams (N...
Real time data processing and model inferncing platform with Kafka streams (N...
 
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
 
Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)
 
Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)
 
Stream processing at Hotstar
Stream processing at HotstarStream processing at Hotstar
Stream processing at Hotstar
 
Data science at scale with Kafka and Flink (Razorpay)
Data science at scale with Kafka and Flink (Razorpay)Data science at scale with Kafka and Flink (Razorpay)
Data science at scale with Kafka and Flink (Razorpay)
 
Key considerations in productionizing streaming applications
Key considerations in productionizing streaming applicationsKey considerations in productionizing streaming applications
Key considerations in productionizing streaming applications
 

Último

Último (20)

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
Intelligent Gimbal FINAL PAPER Engineering.pdf
Intelligent Gimbal FINAL PAPER Engineering.pdfIntelligent Gimbal FINAL PAPER Engineering.pdf
Intelligent Gimbal FINAL PAPER Engineering.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Motion for AI: Creating Empathy in Technology
Motion for AI: Creating Empathy in TechnologyMotion for AI: Creating Empathy in Technology
Motion for AI: Creating Empathy in Technology
 
The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdf
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 

Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)

  • 1. 1 Stream Processing fundamentals and Introduction to ksqlDB Vish Srinivasan Systems Engineer
  • 2. 2 Agenda ● Motivation for Event Streaming ~5 ● Kafka 101 ~10 ● Stream Processing ~20 1. Kafka Streams 2. KSQL ● ksqlDB - An introduction ~10 ● Q&A ~10
  • 4. 44 The Central Challenge Connecting applications with data ETL What happened in the world Messaging What is happening in the world
  • 6. 6
  • 7. 7 STATEEVENT > I changed my job from Snaplogic to Confluent in April 2019 I work at Confluent.
  • 8. 8 JOB CHANGE RECOMMENDATION ENGINE SEARCH INDEX EMAIL SERVICE
  • 9. 9 IS MORE SOFTWARE THE USER OF THE SOFTWARE
  • 10. 1010 Customers Expect Rich Digital Experiences ● Real-Time combined with historical data ● Only Event Streaming Platforms can do this When will my driver get here?
  • 11. 1111 Event-Driven App (Location Tracking) Only Real-Time Events Messaging Queues and Event Streaming Platforms can do this Contextual Event-Driven App (ETA) Real-Time combined with stored data Only Event Streaming Platforms can do this Where is my driver? When will my driver get here? Where is my driver? When will my driver get here? 2 min Why Combine Real-time With Historical Context? VS.
  • 12. 12 Contextual, Event-Driven Apps in the Enterprise “We look at events as running our business. Business people within our organization want to be able to react to events—and oftentimes it's a combination of events.” —Chris D’Agostino, VP of Streaming Data 01 Real-Time Fraud Notifications 03 Automated Transaction Analysis 02 Real-Time “Second Look”
  • 13. 13 Take Away #1 Event Streaming Platforms let you build Contextual Event Driven Applications combining real time and historical data.
  • 14. 14 An Event Streaming Platform gives you three key functionalities Publish & Subscribe to Events Store Events Process & Analyze Events
  • 15. 1515 … But first, Kafka Basics
  • 16. 16 Kafka is a Foundation for Event Streams 0 1 2 3 4 5 6 7 8LOG READS WRITES DESTINATION SYSTEM A DESTINATION SYSTEM B
  • 17. 17 BROKER 1 BROKER 2 BROKER 3 BROKER 4 TOPIC 1-PART 1 Storage: Distributed and Replicated TOPIC 2-PART 2 TOPIC 2-PART 1 TOPIC 1-PART 2 TOPIC 1-PART 1 TOPIC 2-PART 2 TOPIC 2-PART 1 TOPIC 1-PART 2 TOPIC 1-PART 1 TOPIC 2-PART 2 TOPIC 2-PART 1 TOPIC 1-PART 2 2 topics, 2 partitions each, 3 replicas each PRODUCER CONSUMER
  • 19. 19 Messages will be produced in a round robin fashion Written to leader of a partitions Producing to Kafka Time 1 2 3 4 5
  • 20. 20 A B C D hash(key) % numPartitions = N Producing to Kafka with a Key Time
  • 23. 23 C C C C Consuming with Consumer Groups Logical Name Load balanced across all consumers in the group
  • 25. 25 Delivery Guarantees ● Producer Guarantees ○ Acks = 0 ○ Acks = 1 ○ Acks = all ● Consumer Guarantees ○ At least once ○ At most once ○ Exactly once
  • 26. 26 Take Away #2 Kafka lets you publish/subscribe to events as well as store events.
  • 27. 27 An Event Streaming Platform gives you three key functionalities Publish & Subscribe to Events Store Events Process & Analyze Events
  • 28. Stream Processing by Analogy Kafka Cluster Connect API Stream Processing Connect API $ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt
  • 29. 29 Event Transformation with Stream Processing streams The streaming SQL engine for Apache Kafka® CREATE STREAM fraudulent_payments AS SELECT * FROM payments WHERE fraudProbability > 0.8; Apache Kafka® library to write real-time applications and microservices in Java and Scala Confluent KSQL You write only SQL. No Java, Python, or other boilerplate to wrap around it!
  • 30. 30 subscribe(), poll(), send(), flush(), beginTransaction(), … KStream, KTable, filter(), map(), flatMap(), join(), aggregate(), transform(), … CREATE STREAM, CREATE TABLE, SELECT, JOIN, GROUP BY, SUM, … Shoulders of Streaming Giants KSQL UDFs
  • 31. 31 Processing Layer (KSQL, KStreams) 31 00100 11101 11000 00011 00100 00110Topic alice Paris bob Sydney alice RomeStream plus schema (serdes) alice Rome bob Sydney Table plus aggregation Storage Layer (Brokers) Topics vs. Streams and Tables
  • 32. 32 “The ledger of Vish’s sales.” “Vish’s sales totals.” “California sales totals.” Streams record history Tables represent state
  • 33. 33 1. e4 e5 2. Nf3 Nc6 3. Bc4 Bc5 4. d3 Nf6 5. Nbd2 “The sequence of moves.” “The state of the board.” Streams record history Tables represent state
  • 35. 35 ● Processing is partitioned ● Unit of parallelism is stream-task Streams topic with schema Tables underlying topic (usually) compacted ● Materialized view, cannot be mutated ● Implemented on top of a state-store (mutable)
  • 36. 36 Take Away #3 2 tools to process data: Kafka Streams and KSQL 2 concepts in both: Streams and Tables.
  • 38. 38 KSQL for Real-Time Monitoring ● Log data monitoring ● Tracking and alerting ● Syslog data ● Sensor / IoT data ● Application metrics CREATE STREAM syslog_invalid_users AS SELECT host, message FROM syslog WHERE message LIKE '%Invalid user%'; http://cnfl.io/syslogs-filtering / http://cnfl.io/syslog-alerting
  • 39. 39 KSQL for Anomaly Detection ● Identify patterns or anomalies in real- time data, surfaced in milliseconds CREATE TABLE possible_fraud AS SELECT card_number, COUNT(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING COUNT(*) > 3;
  • 40. 40 KSQL for Streaming ETL ● Joining, filtering, and aggregating streams of event data CREATE STREAM vip_actions AS SELECT user_id, page, action FROM clickstream c LEFT JOIN users u ON c.user_id = u.user_id WHERE u.level = 'Platinum';
  • 41. 41 KSQL is a stream processing technology As such it is not yet a great fit for: Ad-hoc queries ● No indexes yet in KSQL ● Kafka often configured to retain data for only a limited span of time BI reports (Tableau etc.) ● No indexes yet in KSQL ● No JDBC ● Most BI tools don’t understand continuous, streaming results
  • 42. 42 PUSH PULL APP Jay’s credit score is 670 Jay’s credit score is 710 Jay’s credit score is 695 What is Jay’s credit score now? 695 APP
  • 43. 43 PUSH PULL SELECT user, credit_score FROM credit_history WHERE ROWKEY = ‘jay’ EMIT CHANGES; SELECT user, credit_score FROM credit_history WHERE ROWKEY = ‘jay’;
  • 44. 44 ksqlDB adds two key features to augment KSQL PULL QUERIES ● Point-in-time lookup of information ● Comparable to a SELECT statement in a relational database EMBEDDED CONNECTORS ● Move event data to and from external data systems ● Available for all supported connectors 21 APPPULL $25 How much does Jay’s ride cost? CONNECTOR CONNECTOR ksqlDB CONNECTOR
  • 45. 46 So, What use cases is ksqlDB a good fit for? It does not replace traditional databases: ● What is a database? ● Materialize events into an opinionated structure (table) so you get power of SQL ● When we query, We are querying the state produced by the processor executing the commit log - we just recreated materialized views.
  • 46. 47 So, What use cases is ksqlDB a good fit for? ksqlDB is primarily useful for three broad categories of applications: ● Building and serving materialized views that power apps ● Creating real-time streaming apps that react to event streams and trigger side effects ● Creating real-time streaming pipelines that continuously transform event streams
  • 47. 48 Summary Takeaways ● Event Streaming Platforms let you build Contextual Event Driven Applications combining real time and historical data. ● Kafka lets you publish/subscribe to events and also store them ● Process data with Kafka Streams or KSQL using Streams and Tables ● ksqlDB makes it easy to build and serve materialized views that power apps
  • 48. 49 Thank You! Reach out if you have any questions: ● Vish Srinivasan - vish@confluent.io Community Slack: https://launchpass.com/confluentcommunity Learn Kafka - https://kafka-tutorials.confluent.io/ ksqlDB - https://ksqldb.io/