SlideShare una empresa de Scribd logo
1 de 61
5 Fabulous OSS Sinks for Kafka
#3 will surprise you!
Rachel Pedreschi
Senior Director, Worldwide Field Engineering and Community
rachel@imply.io
Polyglot Persistence
Who am I?
3
Just for Fun*
4
https://tinyurl.com/KafkaSummit2019
What are you gonna tell ‘em?
5
1.How did we get to kafka? Haven’t we been here before?
2.Five Kafka use cases and their OSS sinks.
3.Popular examples of each of these sinks
4.Compare and Contrast- Polyglot Persistence FTW!
Evolution of Streaming Data
6https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
Evolution of Streaming Data
7https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
Death of OLAP vs OLTP
8https://www.slideshare.net/KaiWaehner/apache-kafka-vs-integration-middleware-mq-etl-esb
ESB
MQ
Uptime
Transactions
ACID
Applications
Speed
Short Request
ETL
Reporting
Analytics
Cubes
Batch
Long Request
POLT OLAPScalable
Its Just Data Now
9https://www.slideshare.net/KaiWaehner/apache-kafka-vs-integration-middleware-mq-etl-esb
ESB
MQ
Uptime
Transactions
ACID
Applications
Speed
Short Request
ETL
Reporting
Analytics
Cubes
Batch
Long Request
PP
Its Just Data Now
10https://www.slideshare.net/KaiWaehner/apache-kafka-vs-integration-middleware-mq-etl-esb
Uptime
Transactions
ACID
Applications
Speed
Short Request
Reporting
Analytics
Cubes
Batch
Long Request
PP
Top 5 Use Cases for Kafka…
11https://kafka.apache.org/uses
…And their 5 Fabulous OSS Sinks
12
#1 Kafka as a Message Broker and Event
Source for Always On applications with
Apache Cassandra
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Real Time Message Delivery and Event Sourcing
28https://www.confluent.io/blog/kafka-connect-cassandra-sink-the-perfect-match/
#2 Metric collection for realtime
monitoring with InfluxDB
29
What is InfluxDB?
30
https://docs.influxdata.com/influxdb/v1.7/
! Use case: Ensure Quality of Streaming Services 
! Data set: they has over 30+ front facing applications and
portals (set-top boxes, etc).  To ensure the quality of service of
these applications, data is collected from the sources and
surfaced to an internal customized real-time user dashboard
interface.  Metrics tracked include day over day quality,
content analytics, bitrates by geographic region
! Business Goal: Being able to view real-time data of how their
services are behaving allows their operation teams to take
actions to maintain a high quality of service for their users.  
Metric Collection
31https://speakerdeck.com/implydatainc/druid-at-charter
Metrics and Event Sourcing
32
 
 
 
Box emitting
metrics
https://speakerdeck.com/implydatainc/druid-at-charter
#3 Stream Processing with Apache
Spark
33
What is Spark Streaming?
34https://spark.apache.org/docs/latest/streaming-programming-guide.html
! Use case: Collect, analyze, and diagnose network flow data for
visualizing traffic among all possible pairs of sources and
destinations in a given internet domain
! Data set: Enriched network flows (application and network
behavior)
! Goals:
○ Provide real-time visibility for capacity planning, traffic
engineering, resource optimization, revenue leak detection
○ Proactively identify and rapidly resolve issues
Stream Processing
35
Stream Processing
36
Spark Streaming vs Kafka Streams
37
Spark Streaming vs Kafka Streams
38https://www.cuelogic.com/blog/analyzing-data-streaming-using-spark-vs-kafka
Spark Streaming vs Kafka Streams
39
#4 Log Aggregation for real time search
with Elasticsearch
40
What is ElasticSearch?
41
Elasticsearch is an open-source, RESTful, distributed search and analytics engine built on
Apache Lucene. Since its release in 2010, Elasticsearch has quickly become the most popular
search engine, and is commonly used for log analytics, full-text search, security
intelligence, business analytics, and operational intelligence use cases.
Log Analytics
42https://hackernoon.com/distributed-log-analytics-using-apache-kafka-kafka-connect-and-fluentd-303330e478af
“Logs… are the heartbeats of our tech stack. They give us insight
into how users interact with us. They provider real time application
intelligence. For that reason we built a robust set of data
infrastructure that can handle large volume of logs from all our
applications, and allow for real time search as well as batch
processing.”
Log Analytics
43https://hackernoon.com/distributed-log-analytics-using-apache-kafka-kafka-connect-and-fluentd-303330e478af
#5 Website Activity Tracking with Apache
Druid
44
What is Druid?
45
high performance
analytics data store for
event-driven data
The problem
46
The problem
47
48Confidential. Do not redistribute.
Search
platform
OLAP
! Real-time ingestion
! Flexible schema
! Full text search
! Batch ingestion
! Efficient storage
! Fast analytic queries
Timeseries
database
! Optimized storage for
time-based datasets
! Time-based functions
Data warehouses
Tightly coupled architecture with limited flexibility.
49
Data
Data
Data
Data Sources
ETL Data
warehouse
Processing Store and Compute
Analytics
Reporting
Data mining
Querying
Data lakes
Modern data architectures are more application-centric.
50Confidential. Do not redistribute.
Data
Data
Data
Data Sources
MapReduce, Spark Apps
ETL
SQL
ML/AI
TSDB
Data
lake
Storage
Data rivers
Streaming architectures are true-to-life and enable faster decision cycles.
51Confidential. Do not redistribute.
Data
Data
Data
Data Sources
Stream processors
Stream
hub
Streaming
analytics
Databases
ETL
Storage
Apps
Archive to
data lake
Website Activity Monitoring
52https://imply.io/post/clickstream-analysis-open-source-divolte-kafka-druid
• Is our campaign working, right now?
• Are we getting more visitors today than we did this time last week?
• Is now a good time to publish content changes targeted at a particular geography?
• Should we target adverts to a different referring website today?
• Have yesterday’s SSO changes made that impact we’ve been looking for?
• How has the release of a new browser this week affected our customer profile? Do we need
to adapt our website code?
Website Activity Monitoring
53https://imply.io/post/clickstream-analysis-open-source-divolte-kafka-druid
The demo
54
55
Druid vs Cassandra
56https://imply.io/post/apache-cassandra-vs-apache-druid
Always On, Fast, Scalable Applications on single partition reads
vs
Low latency OLAP and AdHoc queries over entire datasets
Need seamless multi data center replication?
Are query patterns adhoc or unknown?
Need to do full table scans?
Have more writes than reads and need millisecond response time?
InfluxDB vs Druid
57https://imply.io/post/apache-druid-vs-time-series-databases
Fast timeseries reads and writes
vs
distributed OLAP style analytics
Simple aggregations / counters?
Group on non-time based tags or attributes?
slice and dice on your metrics arbitrarily?
Need a single node or have a small amount of data?
Spark Streaming vs Druid
58
Stream processing
vs
fast SQL queries on historical and real time data
Need the full power of Scala to do transformations?
Want to query real time and historical data?
Looking for tiered storage and automatic backups?
Want to push results to other systems directly?
Druid vs Elasticsearch
59
Search vs Analytics
Complex AdHoc Queries?
Text prediction?
Aggregation at ingestion?
Totally unstructured data?
Final Thoughts
60Photo taken by me on my iPhone @Candytopia SF
Stay in touch
61
@druidio
Join the community!
http://druid.apache.org/
rachel@imply.io
Follow the Druid project on Twitter!

Más contenido relacionado

La actualidad más candente

From Newbie to Highly Available, a Successful Kafka Adoption Tale (Jonathan S...
From Newbie to Highly Available, a Successful Kafka Adoption Tale (Jonathan S...From Newbie to Highly Available, a Successful Kafka Adoption Tale (Jonathan S...
From Newbie to Highly Available, a Successful Kafka Adoption Tale (Jonathan S...
confluent
 
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data PlatformStream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
confluent
 

La actualidad más candente (20)

From Newbie to Highly Available, a Successful Kafka Adoption Tale (Jonathan S...
From Newbie to Highly Available, a Successful Kafka Adoption Tale (Jonathan S...From Newbie to Highly Available, a Successful Kafka Adoption Tale (Jonathan S...
From Newbie to Highly Available, a Successful Kafka Adoption Tale (Jonathan S...
 
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019 Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebook
 
Simplify Governance of Streaming Data
Simplify Governance of Streaming Data Simplify Governance of Streaming Data
Simplify Governance of Streaming Data
 
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data PipelinesETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
 
Espresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom QuiggleEspresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom Quiggle
 
Streaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupStreaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data Meetup
 
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR BenchmarksExtending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
 
Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center   Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
 
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin KumarSiphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
 
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
 
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data PlatformStream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
 
8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...
8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...
8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platform
 
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, ConfluentCan Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
 
Maximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamMaximilian Michels - Flink and Beam
Maximilian Michels - Flink and Beam
 
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/HadoopHopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
 

Similar a Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedreschi, Imply Data) Kafka Summit SF 2019

2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
Timothy Spann
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
Timothy Spann
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Data Con LA
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Timothy Spann
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann
 
OSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
Timothy Spann
 

Similar a Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedreschi, Imply Data) Kafka Summit SF 2019 (20)

2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
 
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksOverview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
Open Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCOpen Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOC
 
Thing you didn't know you could do in Spark
Thing you didn't know you could do in SparkThing you didn't know you could do in Spark
Thing you didn't know you could do in Spark
 
Data streaming
Data streamingData streaming
Data streaming
 
Hadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an exampleHadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an example
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
 
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, BlazegraphDatabase Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
 
OSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
 
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetStreaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
 
JHipster conf 2019 - Kafka Ecosystem
JHipster conf 2019 - Kafka EcosystemJHipster conf 2019 - Kafka Ecosystem
JHipster conf 2019 - Kafka Ecosystem
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data Mesh
 
Introduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridIntroduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - Madrid
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 

Más de confluent

Más de confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedreschi, Imply Data) Kafka Summit SF 2019

  • 1. 5 Fabulous OSS Sinks for Kafka #3 will surprise you! Rachel Pedreschi Senior Director, Worldwide Field Engineering and Community rachel@imply.io
  • 5. What are you gonna tell ‘em? 5 1.How did we get to kafka? Haven’t we been here before? 2.Five Kafka use cases and their OSS sinks. 3.Popular examples of each of these sinks 4.Compare and Contrast- Polyglot Persistence FTW!
  • 6. Evolution of Streaming Data 6https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
  • 7. Evolution of Streaming Data 7https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
  • 8. Death of OLAP vs OLTP 8https://www.slideshare.net/KaiWaehner/apache-kafka-vs-integration-middleware-mq-etl-esb ESB MQ Uptime Transactions ACID Applications Speed Short Request ETL Reporting Analytics Cubes Batch Long Request POLT OLAPScalable
  • 9. Its Just Data Now 9https://www.slideshare.net/KaiWaehner/apache-kafka-vs-integration-middleware-mq-etl-esb ESB MQ Uptime Transactions ACID Applications Speed Short Request ETL Reporting Analytics Cubes Batch Long Request PP
  • 10. Its Just Data Now 10https://www.slideshare.net/KaiWaehner/apache-kafka-vs-integration-middleware-mq-etl-esb Uptime Transactions ACID Applications Speed Short Request Reporting Analytics Cubes Batch Long Request PP
  • 11. Top 5 Use Cases for Kafka… 11https://kafka.apache.org/uses
  • 12. …And their 5 Fabulous OSS Sinks 12
  • 13. #1 Kafka as a Message Broker and Event Source for Always On applications with Apache Cassandra 13
  • 14. 14
  • 15. 15
  • 16. 16
  • 17. 17
  • 18. 18
  • 19. 19
  • 20. 20
  • 21. 21
  • 22. 22
  • 23. 23
  • 24. 24
  • 25. 25
  • 26. 26
  • 27. 27
  • 28. Real Time Message Delivery and Event Sourcing 28https://www.confluent.io/blog/kafka-connect-cassandra-sink-the-perfect-match/
  • 29. #2 Metric collection for realtime monitoring with InfluxDB 29
  • 31. ! Use case: Ensure Quality of Streaming Services  ! Data set: they has over 30+ front facing applications and portals (set-top boxes, etc).  To ensure the quality of service of these applications, data is collected from the sources and surfaced to an internal customized real-time user dashboard interface.  Metrics tracked include day over day quality, content analytics, bitrates by geographic region ! Business Goal: Being able to view real-time data of how their services are behaving allows their operation teams to take actions to maintain a high quality of service for their users.   Metric Collection 31https://speakerdeck.com/implydatainc/druid-at-charter
  • 32. Metrics and Event Sourcing 32       Box emitting metrics https://speakerdeck.com/implydatainc/druid-at-charter
  • 33. #3 Stream Processing with Apache Spark 33
  • 34. What is Spark Streaming? 34https://spark.apache.org/docs/latest/streaming-programming-guide.html
  • 35. ! Use case: Collect, analyze, and diagnose network flow data for visualizing traffic among all possible pairs of sources and destinations in a given internet domain ! Data set: Enriched network flows (application and network behavior) ! Goals: ○ Provide real-time visibility for capacity planning, traffic engineering, resource optimization, revenue leak detection ○ Proactively identify and rapidly resolve issues Stream Processing 35
  • 37. Spark Streaming vs Kafka Streams 37
  • 38. Spark Streaming vs Kafka Streams 38https://www.cuelogic.com/blog/analyzing-data-streaming-using-spark-vs-kafka
  • 39. Spark Streaming vs Kafka Streams 39
  • 40. #4 Log Aggregation for real time search with Elasticsearch 40
  • 41. What is ElasticSearch? 41 Elasticsearch is an open-source, RESTful, distributed search and analytics engine built on Apache Lucene. Since its release in 2010, Elasticsearch has quickly become the most popular search engine, and is commonly used for log analytics, full-text search, security intelligence, business analytics, and operational intelligence use cases.
  • 42. Log Analytics 42https://hackernoon.com/distributed-log-analytics-using-apache-kafka-kafka-connect-and-fluentd-303330e478af “Logs… are the heartbeats of our tech stack. They give us insight into how users interact with us. They provider real time application intelligence. For that reason we built a robust set of data infrastructure that can handle large volume of logs from all our applications, and allow for real time search as well as batch processing.”
  • 44. #5 Website Activity Tracking with Apache Druid 44
  • 45. What is Druid? 45 high performance analytics data store for event-driven data
  • 48. 48Confidential. Do not redistribute. Search platform OLAP ! Real-time ingestion ! Flexible schema ! Full text search ! Batch ingestion ! Efficient storage ! Fast analytic queries Timeseries database ! Optimized storage for time-based datasets ! Time-based functions
  • 49. Data warehouses Tightly coupled architecture with limited flexibility. 49 Data Data Data Data Sources ETL Data warehouse Processing Store and Compute Analytics Reporting Data mining Querying
  • 50. Data lakes Modern data architectures are more application-centric. 50Confidential. Do not redistribute. Data Data Data Data Sources MapReduce, Spark Apps ETL SQL ML/AI TSDB Data lake Storage
  • 51. Data rivers Streaming architectures are true-to-life and enable faster decision cycles. 51Confidential. Do not redistribute. Data Data Data Data Sources Stream processors Stream hub Streaming analytics Databases ETL Storage Apps Archive to data lake
  • 52. Website Activity Monitoring 52https://imply.io/post/clickstream-analysis-open-source-divolte-kafka-druid • Is our campaign working, right now? • Are we getting more visitors today than we did this time last week? • Is now a good time to publish content changes targeted at a particular geography? • Should we target adverts to a different referring website today? • Have yesterday’s SSO changes made that impact we’ve been looking for? • How has the release of a new browser this week affected our customer profile? Do we need to adapt our website code?
  • 55. 55
  • 56. Druid vs Cassandra 56https://imply.io/post/apache-cassandra-vs-apache-druid Always On, Fast, Scalable Applications on single partition reads vs Low latency OLAP and AdHoc queries over entire datasets Need seamless multi data center replication? Are query patterns adhoc or unknown? Need to do full table scans? Have more writes than reads and need millisecond response time?
  • 57. InfluxDB vs Druid 57https://imply.io/post/apache-druid-vs-time-series-databases Fast timeseries reads and writes vs distributed OLAP style analytics Simple aggregations / counters? Group on non-time based tags or attributes? slice and dice on your metrics arbitrarily? Need a single node or have a small amount of data?
  • 58. Spark Streaming vs Druid 58 Stream processing vs fast SQL queries on historical and real time data Need the full power of Scala to do transformations? Want to query real time and historical data? Looking for tiered storage and automatic backups? Want to push results to other systems directly?
  • 59. Druid vs Elasticsearch 59 Search vs Analytics Complex AdHoc Queries? Text prediction? Aggregation at ingestion? Totally unstructured data?
  • 60. Final Thoughts 60Photo taken by me on my iPhone @Candytopia SF
  • 61. Stay in touch 61 @druidio Join the community! http://druid.apache.org/ rachel@imply.io Follow the Druid project on Twitter!