SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
© 2019 Ververica
Stephan Ewen
CTO @ Ververica, Apache Flink PMC
A Unified Analytics Platform with Apache Flink and
Apache Kafka
a.k.a. Unified Streaming & Batch Processing
© 2019 Ververica
2
Apache Kafka and Apache Flink
© 2019 Ververica
3
Apache Kafka and Apache Flink
Log Storage File Storage
DB/Table
s
JDB
C
Debezium
Analytics and Applications on
Streaming Data &
Data-at-Rest
© 2019 Ververica
4
Flink Runtime
Stateful Stream- and Batch Processing
DataStream API StateFun
SQL & Table API
more declarative
more explicit
control
© 2019 Ververica
5
How big can you go? – Alibaba Singles Day
Search Rec. Security
BI
Ads
incl. sub-second updates to the GMV dashboard
Real-time Data Applications
Infrastructure
>5K
nodes
Data Size
985PB
Throughput (Peak)
2.5B
events/sec
Latency
Sub-sec
State Size (Biggest)
100TB
>500K
CPU cores
© 2019 Ververica
6
How small can you go? - U-Hopper FogGuru
Cluster of 5 Raspberry Pi 3b+ Data volume: 800 events/sec
Docker Swarm + Flink + Mosquitto
“The Fridge”
© 2019 Ververica
Streaming & Batch SQL
= SQL!
© 2019 Ververica
8
Our Sample Setup
Real-time
Queries
real-time
events
historical data
ingest
Historical
Queries
Combined Queries
© 2019 Ververica
9
SQL – Static Data (Batch Case)
user cnt
Mary 2
Bob 1
Liz 1
SELECT
user,
COUNT(url) as cnt
FROM clicks
GROUP BY user
user cTime url
Mary 12:00:00 https://…
Bob 12:00:00 https://…
Mary 12:00:02 https://…
Liz 12:00:03 https://…
© 2019 Ververica
10
SQL – Static Data (Streaming Case)
user cTime url
user cnt
SELECT
user,
COUNT(url) as cnt
FROM clicks
GROUP BY user
Mary 12:00:00 https://…
Bob 12:00:00 https://…
Mary 12:00:02 https://…
Liz 12:00:03 https://…
Bob 1
Liz 1
Mary 1
Mary 2
© 2019 Ververica
11
Demo Time
© 2019 Ververica
Stream to Batches and Back Again
© 2019 Ververica
13
ye olde stuff new events
2020-4-1
12:00 am
2016-4-1
1:00 am
2016-4-1
2:00 am
2016-4-7
11:00pm
2016-4-7
10:00pm
…
Inges
t
© 2019 Ververica
14
ye olde stuff new events
2020-4-1
12:00 am
2016-4-1
1:00 am
2016-4-1
2:00 am
2016-4-7
11:00pm
2016-4-7
10:00pm
…
Real-tim
e
Queries
Historical
Queries
Inges
t
© 2019 Ververica
15
Why combine data in Kafka and in Object Stores?
•Why not just longer Kafka retention?
•Sample Numbers from a petabyte-scale Flink + Kafka user
– Compressed columnar data avg. 5x smaller
– Object Store 3-4x cost cheaper than persistent volumes (EBS)
– Object Store already replicated (cross AZ), replaces broker replication (3x)
🡪 ~50x cheaper data storage
•Faster access to compressed columnar data for
– Higher read parallelism
– Read optimizations: Predicate pushdowns, projection pushdowns, partition pruning, etc.
© 2019 Ververica
16
ye olde stuff new events
2020-4-1
12:00 am
2016-4-1
1:00 am
2016-4-1
2:00 am
2016-4-7
11:00pm
2016-4-7
10:00pm
…
Inges
t
while we are at it…
…normalize our data on the way
and convert currencies.
© 2019 Ververica
17
Temporal Join
Symbol Exchange Rate Timestamp Order Timestamp Currency
EUR 1.0 12.00 A @ 17.00 $ 12:05 USD
USD 0.88 12:01 B @ 12.32 £ 12:11 GBP
USD 0.90 12:03
GBP 1.17 12:04
USD 0.89 12:12
GBP 1.20 12:13
USD 0.87 12:17
C @ 111.51 $ 12:01 USD
D @ 2.39 $ 12:02 USD
E @ 17.11 $ 12:20 USD
F @ 243.50 £ 12:15 GBP
G @ 3.49 $ 12:10 USD
H @ 0.99 £ 12:16 GBP
© 2019 Ververica
18
Temporal Join
Symbol Exchange Rate Timestamp Order Timestamp Currency
EUR 1.0 12.00 A @ 17.00 $ 12:05 USD
B @ 12.32 £ 12:11 GBP
GBP 1.17 12:04
USD 0.90 12:03 C @ 111.51 $ 12:01 USD
D @ 2.39 $ 12:02 USD
E @ 17.11 $ 12:20 USD
F @ 243.50 £ 12:15 GBP
G @ 3.49 $ 12:07 USD
H @ 0.99 £ 12:16 GBP
@ 12:07h
Symbol Exchange Rate Timestamp
EUR 1.0 12.00
GBP 1.20 12:13
USD 0.89 12:12
@ 12:15h
© 2019 Ververica
19
Demo Time
© 2019 Ververica
Connecting the Streams
© 2019 Ververica
21
ye olde stuff new events
2020-4-1
12:00 am
2016-4-1
1:00 am
2016-4-1
2:00 am
2016-4-7
11:00pm
2016-4-7
10:00pm
…
Bootstrap Streaming Queries
Inges
t
© 2019 Ververica
22
Demo Time
© 2019 Ververica
23
Event Time in
Kafka Topic
Event time
Data (orders / currency rates)
Temporal Join
Time bucketing
on file system
Time partition pruning
for queries
Stitching together
Stream again from
files and stream tail
© 2019 Ververica
EOF
© 2019 Ververica
25
Apache Kafka and Apache Flink
© 2019 Ververica
26
Everything is a Stream
Bounded & Unbounded
Batch Processing complements
Stream Processing
(Compressed Columnar) File
Storage complements Log Storage
make the most of your
(event) time
SQL generalizes well
across batch & streaming
a unified engine makes
things easier

Más contenido relacionado

La actualidad más candente

Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...HostedbyConfluent
 
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...HostedbyConfluent
 
Cloud Connect 2012, Big Data @ Netflix
Cloud Connect 2012, Big Data @ NetflixCloud Connect 2012, Big Data @ Netflix
Cloud Connect 2012, Big Data @ NetflixJerome Boulon
 
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul MasterCornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul MasterSpark Summit
 
Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...
Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...
Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...HostedbyConfluent
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...HostedbyConfluent
 
Big Data Kappa | Mark Senerth, The Walt Disney Company - DMED, Data Tech
Big Data Kappa | Mark Senerth, The Walt Disney Company - DMED, Data TechBig Data Kappa | Mark Senerth, The Walt Disney Company - DMED, Data Tech
Big Data Kappa | Mark Senerth, The Walt Disney Company - DMED, Data TechHostedbyConfluent
 
Netflix incloudsmarch8 2011forwiki
Netflix incloudsmarch8 2011forwikiNetflix incloudsmarch8 2011forwiki
Netflix incloudsmarch8 2011forwikiKevin McEntee
 
Personalization Journey: From Single Node to Cloud Streaming
Personalization Journey: From Single Node to Cloud StreamingPersonalization Journey: From Single Node to Cloud Streaming
Personalization Journey: From Single Node to Cloud StreamingDatabricks
 
Maximize the Business Value of Machine Learning and Data Science with Kafka (...
Maximize the Business Value of Machine Learning and Data Science with Kafka (...Maximize the Business Value of Machine Learning and Data Science with Kafka (...
Maximize the Business Value of Machine Learning and Data Science with Kafka (...confluent
 
The Netflix data platform: Now and in the future by Kurt Brown
The Netflix data platform: Now and in the future by Kurt BrownThe Netflix data platform: Now and in the future by Kurt Brown
The Netflix data platform: Now and in the future by Kurt BrownData Con LA
 
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...confluent
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017Monal Daxini
 
Real-Time Market Data Analytics Using Kafka Streams
Real-Time Market Data Analytics Using Kafka StreamsReal-Time Market Data Analytics Using Kafka Streams
Real-Time Market Data Analytics Using Kafka Streamsconfluent
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Databricks
 
Low-latency real-time data processing at giga-scale with Kafka | John DesJard...
Low-latency real-time data processing at giga-scale with Kafka | John DesJard...Low-latency real-time data processing at giga-scale with Kafka | John DesJard...
Low-latency real-time data processing at giga-scale with Kafka | John DesJard...HostedbyConfluent
 
Introduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OKIntroduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OKKriangkrai Chaonithi
 
Real Time Data Infrastructure team overview
Real Time Data Infrastructure team overviewReal Time Data Infrastructure team overview
Real Time Data Infrastructure team overviewMonal Daxini
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsTom Van den Bulck
 

La actualidad más candente (19)

Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
 
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
 
Cloud Connect 2012, Big Data @ Netflix
Cloud Connect 2012, Big Data @ NetflixCloud Connect 2012, Big Data @ Netflix
Cloud Connect 2012, Big Data @ Netflix
 
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul MasterCornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
 
Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...
Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...
Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
 
Big Data Kappa | Mark Senerth, The Walt Disney Company - DMED, Data Tech
Big Data Kappa | Mark Senerth, The Walt Disney Company - DMED, Data TechBig Data Kappa | Mark Senerth, The Walt Disney Company - DMED, Data Tech
Big Data Kappa | Mark Senerth, The Walt Disney Company - DMED, Data Tech
 
Netflix incloudsmarch8 2011forwiki
Netflix incloudsmarch8 2011forwikiNetflix incloudsmarch8 2011forwiki
Netflix incloudsmarch8 2011forwiki
 
Personalization Journey: From Single Node to Cloud Streaming
Personalization Journey: From Single Node to Cloud StreamingPersonalization Journey: From Single Node to Cloud Streaming
Personalization Journey: From Single Node to Cloud Streaming
 
Maximize the Business Value of Machine Learning and Data Science with Kafka (...
Maximize the Business Value of Machine Learning and Data Science with Kafka (...Maximize the Business Value of Machine Learning and Data Science with Kafka (...
Maximize the Business Value of Machine Learning and Data Science with Kafka (...
 
The Netflix data platform: Now and in the future by Kurt Brown
The Netflix data platform: Now and in the future by Kurt BrownThe Netflix data platform: Now and in the future by Kurt Brown
The Netflix data platform: Now and in the future by Kurt Brown
 
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
 
Real-Time Market Data Analytics Using Kafka Streams
Real-Time Market Data Analytics Using Kafka StreamsReal-Time Market Data Analytics Using Kafka Streams
Real-Time Market Data Analytics Using Kafka Streams
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
Low-latency real-time data processing at giga-scale with Kafka | John DesJard...
Low-latency real-time data processing at giga-scale with Kafka | John DesJard...Low-latency real-time data processing at giga-scale with Kafka | John DesJard...
Low-latency real-time data processing at giga-scale with Kafka | John DesJard...
 
Introduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OKIntroduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OK
 
Real Time Data Infrastructure team overview
Real Time Data Infrastructure team overviewReal Time Data Infrastructure team overview
Real Time Data Infrastructure team overview
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka Streams
 

Similar a A unified analytics platform with Kafka and Flink | Stephan Ewen, Ververica

Data Pipelines -Big Data Meets Salesforce
Data Pipelines -Big Data Meets SalesforceData Pipelines -Big Data Meets Salesforce
Data Pipelines -Big Data Meets SalesforceCarolEnLaNube
 
Unified Data Processing with Apache Flink and Apache Pulsar_Seth Wiesman
Unified Data Processing with Apache Flink and Apache Pulsar_Seth WiesmanUnified Data Processing with Apache Flink and Apache Pulsar_Seth Wiesman
Unified Data Processing with Apache Flink and Apache Pulsar_Seth WiesmanStreamNative
 
Data Pipelines: Big Data Meets Salesforce
Data Pipelines: Big Data Meets SalesforceData Pipelines: Big Data Meets Salesforce
Data Pipelines: Big Data Meets SalesforceSalesforce Developers
 
Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story Roman Chukh
 
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...Flink Forward
 
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021StreamNative
 
Webinar - Order out of Chaos: Avoiding the Migration Migraine
Webinar - Order out of Chaos: Avoiding the Migration MigraineWebinar - Order out of Chaos: Avoiding the Migration Migraine
Webinar - Order out of Chaos: Avoiding the Migration MigrainePeak Hosting
 
Talend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark MeetupTalend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark MeetupModern Data Stack France
 
Chicago AWS user group meetup - May 2014 at Cohesive
Chicago AWS user group meetup - May 2014 at CohesiveChicago AWS user group meetup - May 2014 at Cohesive
Chicago AWS user group meetup - May 2014 at CohesiveAWS Chicago
 
Chicago AWS user group meetup - May 2014 at Cohesive
Chicago AWS user group meetup - May 2014 at CohesiveChicago AWS user group meetup - May 2014 at Cohesive
Chicago AWS user group meetup - May 2014 at CohesiveCloudCamp Chicago
 
Mining AWR V2 - Trend Analysis
Mining AWR V2 - Trend AnalysisMining AWR V2 - Trend Analysis
Mining AWR V2 - Trend AnalysisMaris Elsins
 
(BAC307) The Cold Data Playbook: Building the Ultimate Archive Solution in Am...
(BAC307) The Cold Data Playbook: Building the Ultimate Archive Solution in Am...(BAC307) The Cold Data Playbook: Building the Ultimate Archive Solution in Am...
(BAC307) The Cold Data Playbook: Building the Ultimate Archive Solution in Am...Amazon Web Services
 
How Concur uses Big Data to get you to Tableau Conference On Time
How Concur uses Big Data to get you to Tableau Conference On TimeHow Concur uses Big Data to get you to Tableau Conference On Time
How Concur uses Big Data to get you to Tableau Conference On TimeDenny Lee
 
Maximize Application Performance and Bandwidth Efficiency with WAN Optimization
Maximize Application Performance and Bandwidth Efficiency with WAN OptimizationMaximize Application Performance and Bandwidth Efficiency with WAN Optimization
Maximize Application Performance and Bandwidth Efficiency with WAN OptimizationCisco Enterprise Networks
 
HPC Market Update and Observations on Big Memory
HPC Market Update and Observations on Big MemoryHPC Market Update and Observations on Big Memory
HPC Market Update and Observations on Big MemoryMemVerge
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data EngineeringHarald Erb
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 

Similar a A unified analytics platform with Kafka and Flink | Stephan Ewen, Ververica (20)

Data Pipelines -Big Data Meets Salesforce
Data Pipelines -Big Data Meets SalesforceData Pipelines -Big Data Meets Salesforce
Data Pipelines -Big Data Meets Salesforce
 
Flink SQL in Action
Flink SQL in ActionFlink SQL in Action
Flink SQL in Action
 
Unified Data Processing with Apache Flink and Apache Pulsar_Seth Wiesman
Unified Data Processing with Apache Flink and Apache Pulsar_Seth WiesmanUnified Data Processing with Apache Flink and Apache Pulsar_Seth Wiesman
Unified Data Processing with Apache Flink and Apache Pulsar_Seth Wiesman
 
Data Pipelines: Big Data Meets Salesforce
Data Pipelines: Big Data Meets SalesforceData Pipelines: Big Data Meets Salesforce
Data Pipelines: Big Data Meets Salesforce
 
IPv6 @ Cloudflare
IPv6 @ CloudflareIPv6 @ Cloudflare
IPv6 @ Cloudflare
 
Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story
 
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...
 
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
 
Webinar - Order out of Chaos: Avoiding the Migration Migraine
Webinar - Order out of Chaos: Avoiding the Migration MigraineWebinar - Order out of Chaos: Avoiding the Migration Migraine
Webinar - Order out of Chaos: Avoiding the Migration Migraine
 
Talend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark MeetupTalend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark Meetup
 
Chicago AWS user group meetup - May 2014 at Cohesive
Chicago AWS user group meetup - May 2014 at CohesiveChicago AWS user group meetup - May 2014 at Cohesive
Chicago AWS user group meetup - May 2014 at Cohesive
 
Chicago AWS user group meetup - May 2014 at Cohesive
Chicago AWS user group meetup - May 2014 at CohesiveChicago AWS user group meetup - May 2014 at Cohesive
Chicago AWS user group meetup - May 2014 at Cohesive
 
Mining AWR V2 - Trend Analysis
Mining AWR V2 - Trend AnalysisMining AWR V2 - Trend Analysis
Mining AWR V2 - Trend Analysis
 
(BAC307) The Cold Data Playbook: Building the Ultimate Archive Solution in Am...
(BAC307) The Cold Data Playbook: Building the Ultimate Archive Solution in Am...(BAC307) The Cold Data Playbook: Building the Ultimate Archive Solution in Am...
(BAC307) The Cold Data Playbook: Building the Ultimate Archive Solution in Am...
 
How Concur uses Big Data to get you to Tableau Conference On Time
How Concur uses Big Data to get you to Tableau Conference On TimeHow Concur uses Big Data to get you to Tableau Conference On Time
How Concur uses Big Data to get you to Tableau Conference On Time
 
Maximize Application Performance and Bandwidth Efficiency with WAN Optimization
Maximize Application Performance and Bandwidth Efficiency with WAN OptimizationMaximize Application Performance and Bandwidth Efficiency with WAN Optimization
Maximize Application Performance and Bandwidth Efficiency with WAN Optimization
 
HPC Market Update and Observations on Big Memory
HPC Market Update and Observations on Big MemoryHPC Market Update and Observations on Big Memory
HPC Market Update and Observations on Big Memory
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 

Más de HostedbyConfluent

Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
 

Más de HostedbyConfluent (20)

Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Último

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 

Último (20)

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 

A unified analytics platform with Kafka and Flink | Stephan Ewen, Ververica

  • 1. © 2019 Ververica Stephan Ewen CTO @ Ververica, Apache Flink PMC A Unified Analytics Platform with Apache Flink and Apache Kafka a.k.a. Unified Streaming & Batch Processing
  • 2. © 2019 Ververica 2 Apache Kafka and Apache Flink
  • 3. © 2019 Ververica 3 Apache Kafka and Apache Flink Log Storage File Storage DB/Table s JDB C Debezium Analytics and Applications on Streaming Data & Data-at-Rest
  • 4. © 2019 Ververica 4 Flink Runtime Stateful Stream- and Batch Processing DataStream API StateFun SQL & Table API more declarative more explicit control
  • 5. © 2019 Ververica 5 How big can you go? – Alibaba Singles Day Search Rec. Security BI Ads incl. sub-second updates to the GMV dashboard Real-time Data Applications Infrastructure >5K nodes Data Size 985PB Throughput (Peak) 2.5B events/sec Latency Sub-sec State Size (Biggest) 100TB >500K CPU cores
  • 6. © 2019 Ververica 6 How small can you go? - U-Hopper FogGuru Cluster of 5 Raspberry Pi 3b+ Data volume: 800 events/sec Docker Swarm + Flink + Mosquitto “The Fridge”
  • 7. © 2019 Ververica Streaming & Batch SQL = SQL!
  • 8. © 2019 Ververica 8 Our Sample Setup Real-time Queries real-time events historical data ingest Historical Queries Combined Queries
  • 9. © 2019 Ververica 9 SQL – Static Data (Batch Case) user cnt Mary 2 Bob 1 Liz 1 SELECT user, COUNT(url) as cnt FROM clicks GROUP BY user user cTime url Mary 12:00:00 https://… Bob 12:00:00 https://… Mary 12:00:02 https://… Liz 12:00:03 https://…
  • 10. © 2019 Ververica 10 SQL – Static Data (Streaming Case) user cTime url user cnt SELECT user, COUNT(url) as cnt FROM clicks GROUP BY user Mary 12:00:00 https://… Bob 12:00:00 https://… Mary 12:00:02 https://… Liz 12:00:03 https://… Bob 1 Liz 1 Mary 1 Mary 2
  • 12. © 2019 Ververica Stream to Batches and Back Again
  • 13. © 2019 Ververica 13 ye olde stuff new events 2020-4-1 12:00 am 2016-4-1 1:00 am 2016-4-1 2:00 am 2016-4-7 11:00pm 2016-4-7 10:00pm … Inges t
  • 14. © 2019 Ververica 14 ye olde stuff new events 2020-4-1 12:00 am 2016-4-1 1:00 am 2016-4-1 2:00 am 2016-4-7 11:00pm 2016-4-7 10:00pm … Real-tim e Queries Historical Queries Inges t
  • 15. © 2019 Ververica 15 Why combine data in Kafka and in Object Stores? •Why not just longer Kafka retention? •Sample Numbers from a petabyte-scale Flink + Kafka user – Compressed columnar data avg. 5x smaller – Object Store 3-4x cost cheaper than persistent volumes (EBS) – Object Store already replicated (cross AZ), replaces broker replication (3x) 🡪 ~50x cheaper data storage •Faster access to compressed columnar data for – Higher read parallelism – Read optimizations: Predicate pushdowns, projection pushdowns, partition pruning, etc.
  • 16. © 2019 Ververica 16 ye olde stuff new events 2020-4-1 12:00 am 2016-4-1 1:00 am 2016-4-1 2:00 am 2016-4-7 11:00pm 2016-4-7 10:00pm … Inges t while we are at it… …normalize our data on the way and convert currencies.
  • 17. © 2019 Ververica 17 Temporal Join Symbol Exchange Rate Timestamp Order Timestamp Currency EUR 1.0 12.00 A @ 17.00 $ 12:05 USD USD 0.88 12:01 B @ 12.32 £ 12:11 GBP USD 0.90 12:03 GBP 1.17 12:04 USD 0.89 12:12 GBP 1.20 12:13 USD 0.87 12:17 C @ 111.51 $ 12:01 USD D @ 2.39 $ 12:02 USD E @ 17.11 $ 12:20 USD F @ 243.50 £ 12:15 GBP G @ 3.49 $ 12:10 USD H @ 0.99 £ 12:16 GBP
  • 18. © 2019 Ververica 18 Temporal Join Symbol Exchange Rate Timestamp Order Timestamp Currency EUR 1.0 12.00 A @ 17.00 $ 12:05 USD B @ 12.32 £ 12:11 GBP GBP 1.17 12:04 USD 0.90 12:03 C @ 111.51 $ 12:01 USD D @ 2.39 $ 12:02 USD E @ 17.11 $ 12:20 USD F @ 243.50 £ 12:15 GBP G @ 3.49 $ 12:07 USD H @ 0.99 £ 12:16 GBP @ 12:07h Symbol Exchange Rate Timestamp EUR 1.0 12.00 GBP 1.20 12:13 USD 0.89 12:12 @ 12:15h
  • 21. © 2019 Ververica 21 ye olde stuff new events 2020-4-1 12:00 am 2016-4-1 1:00 am 2016-4-1 2:00 am 2016-4-7 11:00pm 2016-4-7 10:00pm … Bootstrap Streaming Queries Inges t
  • 23. © 2019 Ververica 23 Event Time in Kafka Topic Event time Data (orders / currency rates) Temporal Join Time bucketing on file system Time partition pruning for queries Stitching together Stream again from files and stream tail
  • 25. © 2019 Ververica 25 Apache Kafka and Apache Flink
  • 26. © 2019 Ververica 26 Everything is a Stream Bounded & Unbounded Batch Processing complements Stream Processing (Compressed Columnar) File Storage complements Log Storage make the most of your (event) time SQL generalizes well across batch & streaming a unified engine makes things easier