SlideShare una empresa de Scribd logo
1 de 29
- FABIAN HUESKE, SOFTWARE ENGINEER
- TIMO WALTHER, SOFTWARE ENGINEER
WHY AND HOW TO LEVERAGE
THE POWER AND SIMPLICITY OF
SQL ON APACHE FLINK®
© 2018 data Artisans2
ABOUT DATA ARTISANS
Original creators of
Apache Flink®
Open Source Apache Flink
+ dA Application Manager
© 2018 data Artisans3
DA PLATFORM
data-artisans.com/download
© 2018 data Artisans4
POWERED BY APACHE FLINK
© 2018 data Artisans5
FLINK’S POWERFUL ABSTRACTIONS
Process Function (events, state, time)
DataStream API (streams, windows)
SQL / Table API (dynamic tables)
Stream- & Batch
Data Processing
High-level
Analytics API
Stateful Event-
Driven Applications
val stats = stream
.keyBy("sensor")
.timeWindow(Time.seconds(5))
.sum((a, b) -> a.add(b))
def processElement(event: MyEvent, ctx: Context, out: Collector[Result]) = {
// work with event and state
(event, state.value) match { … }
out.collect(…) // emit events
state.update(…) // modify state
// schedule a timer callback
ctx.timerService.registerEventTimeTimer(event.timestamp + 500)
}
Layered abstractions to
navigate simple to complex use cases
© 2018 data Artisans6
APACHE FLINK’S RELATIONAL APIS
Unified APIs for batch & streaming data
A query specifies exactly the same result
regardless whether its input is
static batch data or streaming data.
tableEnvironment
.scan("clicks")
.groupBy('user)
.select('user, 'url.count as 'cnt)
SELECT user, COUNT(url) AS cnt
FROM clicks
GROUP BY user
LINQ-style Table APIANSI SQL
© 2018 data Artisans7
QUERY TRANSLATION
tableEnvironment
.scan("clicks")
.groupBy('user)
.select('user, 'url.count as 'cnt)
SELECT user, COUNT(url) AS cnt
FROM clicks
GROUP BY user
Input data is
bounded
(batch)
Input data is
unbounded
(streaming)
© 2018 data Artisans8
WHAT IF “CLICKS” IS A FILE?
Clicks
user cTime url
Mary 12:00:00 https://…
Bob 12:00:00 https://…
Mary 12:00:02 https://…
Liz 12:00:03 https://…
user cnt
Mary 2
Bob 1
Liz 1
SELECT
user,
COUNT(url) as cnt
FROM clicks
GROUP BY user
Input data is
read at once
Result is produced
at once
© 2018 data Artisans9
WHAT IF “CLICKS” IS A STREAM?
user cTime url
user cnt
SELECT
user,
COUNT(url) as cnt
FROM clicks
GROUP BY user
Clicks
Mary 12:00:00 https://…
Bob 12:00:00 https://…
Mary 12:00:02 https://…
Liz 12:00:03 https://…
Bob 1
Liz 1
Mary 1Mary 2
Input data is
continuously read
Result is
continuously updated
The result is the same!
© 2018 data Artisans10
• Usability
‒ ANSI SQL syntax: No custom “StreamSQL” syntax.
‒ ANSI SQL semantics: No stream-specific results.
• Portability
‒ Run the same query on bounded and unbounded data
‒ Run the same query on recorded and real-time data
• How can we achieve SQL semantics on streams?
now
bounded query
unbounded query
past future
bounded query
start of the stream
unbounded query
WHY IS STREAM-BATCH UNIFICATION IMPORTANT?
© 2018 data Artisans11
• Materialized views (MV) are similar to regular views,
but persisted to disk or memory
‒Used to speed-up analytical queries
‒MVs need to be updated when the base tables change
• MV maintenance is very similar to SQL on streams
‒Base table updates are a stream of DML statements
‒MV definition query is evaluated on that stream
‒MV is query result and continuously updated
DATABASE SYSTEMS RUN QUERIES ON STREAMS
© 2018 data Artisans12
CONTINUOUS QUERIES IN FLINK
• Core concept is a “DynamicTable”
‒Dynamic tables are changing over time
• Queries on dynamic tables
‒produce new dynamic tables (which are updated based on input)
‒do not terminate
• Stream ↔ Dynamic table conversions
12
© 2018 data Artisans13
STREAM ↔ DYNAMIC TABLE CONVERSIONS
• Append Conversions
‒Records are only inserted/appended
• Upsert Conversions
‒Records are inserted/updated/deleted
‒Records have a (composite) unique key
• ChangelogConversions
‒Records are inserted/updated/deleted
© 2018 data Artisans14
SQL FEATURE SET IN FLINK 1.5.0
• SELECT FROMWHERE
• GROUP BY / HAVING
‒ Non-windowed,TUMBLE, HOP, SESSION windows
• JOIN
‒ Windowed INNER, LEFT / RIGHT / FULL OUTER JOIN
‒ Non-windowed INNER JOIN
• Scalar, aggregation, table-valued UDFs
• SQL CLI Client (beta)
• [streaming only] OVER /WINDOW
‒ UNBOUNDED / BOUNDED PRECEDING
• [batch only] UNION / INTERSECT / EXCEPT / IN / ORDER BY
© 2018 data Artisans15
WHAT CAN I BUILD WITH THIS?
• Data Pipelines
‒ Transform, aggregate, and move events in real-time
• Low-latency ETL
‒ Convert and write streams to file systems, DBMS, K-V stores, indexes, …
‒ Ingest appearing files to produce streams
• Stream & Batch Analytics
‒ Run analytical queries over bounded and unbounded data
‒ Query and compare historic and real-time data
• Power Live Dashboards
‒ Compute and update data to visualize in real-time
© 2018 data Artisans16
THE NEW YORK TAXI RIDES DATA SET
• The NewYork CityTaxi & Limousine Commission provides a public data set
about past taxi rides in NewYork City
• We can derive a streaming table from the data
• Table: TaxiRides
rideId: BIGINT // ID of the taxi ride
isStart: BOOLEAN // flag for pick-up (true) or drop-off (false) event
lon: DOUBLE // longitude of pick-up or drop-off location
lat: DOUBLE // latitude of pick-up or drop-off location
rowtime: TIMESTAMP // time of pick-up or drop-off event
© 2018 data Artisans17
SELECT cell,
isStart,
HOP_END(rowtime, INTERVAL '5' MINUTE, INTERVAL '15' MINUTE) AS hopEnd,
COUNT(*) AS cnt
FROM (SELECT rowtime, isStart, toCellId(lon, lat) AS cell
FROM TaxiRides)
GROUP BY cell,
isStart,
HOP(rowtime, INTERVAL '5' MINUTE, INTERVAL '15' MINUTE)
 Compute every 5 minutes for each location the
number of departing and arriving taxis
of the last 15 minutes.
IDENTIFY POPULAR PICK-UP / DROP-OFF LOCATIONS
© 2018 data Artisans18
SELECT pickUpCell,
AVG(TIMESTAMPDIFF(MINUTE, e.rowtime, s.rowtime) AS avgDuration
FROM (SELECT rideId, rowtime, toCellId(lon, lat) AS pickUpCell
FROM TaxiRides
WHERE isStart) s
JOIN
(SELECT rideId, rowtime
FROM TaxiRides
WHERE NOT isStart) e
ON s.rideId = e.rideId AND
e.rowtime BETWEEN s.rowtime AND s.rowtime + INTERVAL '1' HOUR
GROUP BY pickUpCell
 Join start ride and end ride events on rideId and
compute average ride duration per pick-up location.
AVERAGE RIDE DURATION PER PICK-UP LOCATION
© 2018 data Artisans19
BUILDING A DASHBOARD
Elastic
Search
Kafka
SELECT cell,
isStart,
HOP_END(rowtime, INTERVAL '5' MINUTE, INTERVAL '15' MINUTE) AS hopEnd,
COUNT(*) AS cnt
FROM (SELECT rowtime, isStart, toCellId(lon, lat) AS cell
FROM TaxiRides)
GROUP BY cell,
isStart,
HOP(rowtime, INTERVAL '5' MINUTE, INTERVAL '15' MINUTE)
© 2018 data Artisans20
SOUNDS GREAT! HOW CAN I USE IT?
• ATM, SQL queries must be embedded in Java/Scala code 
‒ Tight integration with DataStream and DataSet APIs
• Community focused on internals (until Flink 1.4.0)
‒ Operators, types, built-in functions, extensibility (UDFs, extern. catalog)
‒ Proven at scale by Alibaba, Huawei, and Uber
‒ All built their own submission system & connectors library
• Community neglected user interfaces
‒ No query submission client, no CLI
‒ No integration with common catalog services
‒ Limited set ofTableSources andTableSinks
© 2018 data Artisans21
COMING IN FLINK 1.5.0 - SQL CLI
DemoTime!
That’s a nice toy, but …
... can I use it for anything serious?
© 2018 data Artisans22
FLIP-24 – A SQL QUERY SERVICE
• REST service to submit & manage SQL queries
‒ SELECT …
‒ INSERT INTO SELECT …
‒ CREATE MATERIALIZE VIEW …
• Serve results of “SELECT …” queries
• Provide a table catalog (integrated with external catalogs)
• Use cases
‒ Data exploration with notebooks like Apache Zeppelin
‒ Access to real-time data from applications
‒ Easy data routing / ETL from management consoles
© 2018 data Artisans23
CHALLENGE: SERVE DYNAMIC TABLES
Unbounded input yields unbounded results
SELECT user, COUNT(url) AS cnt
FROM clicks
GROUP BY user
SELECT user, url
FROM clicks
WHERE url LIKE '%xyz.com'
Append-onlyTable
• Result rows are never changed
• Consume, buffer, or drop rows
Continuously updatingTable
• Result rows can be updated or
deleted
• Consume changelog or
periodically query result table
• Result table must be maintained
somewhere
(Serving bounded results is easy)
© 2018 data Artisans24
Application
FLIP-24 – A SQL QUERY SERVICE
Query Service
Catalog
Optimizer
Database /
HDFS
Event Log
External Catalog
(Schema Registry,
HCatalog, …)
Query
Results
Submit Query Job
State
REST
Result Server
Submit Query
REST
Database /
HDFS
Event Log
SELECT
user,
COUNT(url) AS cnt
FROM clicks
GROUP BY user
Results are served by Query Service via REST
+ Application does not need a special client
+ Works well in many network configurations
− Query service can become bottleneck
© 2018 data Artisans25
Application
FLIP-24 – A SQL QUERY SERVICE
Query Service
SELECT
user,
COUNT(url) AS cnt
FROM clicks
GROUP BY user Catalog
Optimizer
Database /
HDFS
Event Log
External Catalog
(Schema Registry,
HCatalog, …)
Query
Submit Query Job
State
REST
Result Server
Submit Query
REST
Database /
HDFS
Event Log
Serving
Library
Result Handle
© 2018 data Artisans26
WE WANT YOUR FEEDBACK!
• The design of SQL Query Service is not final yet.
• Check out FLIP-24 and FLINK-7594
• Share your ideas and feedback and discuss on
JIRA or dev@flink.apache.org.
© 2018 data Artisans27
SUMMARY
• Unification of stream and batch is important.
• Flink’s SQL solves many streaming and batch use cases.
• Runs in production at Alibaba, Uber, and others.
• The community is working on improving user interfaces.
• Get involved, discuss, and contribute!
THANK YOU!
Available on O’Reilly Early Release!
THANK YOU!
@fhueske & @twalthr
@dataArtisans
@ApacheFlink
WE ARE HIRING
data-artisans.com/careers

Más contenido relacionado

La actualidad más candente

Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...Flink Forward
 
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...Flink Forward
 
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...Flink Forward
 
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...
Flink Forward San Francisco 2018 keynote:  Srikanth Satya - "Stream Processin...Flink Forward San Francisco 2018 keynote:  Srikanth Satya - "Stream Processin...
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...Flink Forward
 
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...Flink Forward
 
Make streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQLMake streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQLDataWorks Summit
 
Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...
Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...
Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...Flink Forward
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...confluent
 
apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...
apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...
apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...apidays
 
Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...
Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...
Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...Flink Forward
 
Stream Analytics with SQL on Apache Flink
 Stream Analytics with SQL on Apache Flink Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache FlinkFabian Hueske
 
Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...
Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...
Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...Flink Forward
 
The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®Aljoscha Krettek
 
Apache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondApache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondBowen Li
 
Scaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache FlinkScaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache FlinkTill Rohrmann
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®confluent
 
The Past, Present, and Future of Apache Flink
The Past, Present, and Future of Apache FlinkThe Past, Present, and Future of Apache Flink
The Past, Present, and Future of Apache FlinkAljoscha Krettek
 
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)KafkaZone
 
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward
 

La actualidad más candente (20)

Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
 
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
 
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
Flink Forward Berlin 2017 Keynote: Ferd Scheepers - Taking away customer fric...
 
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...
Flink Forward San Francisco 2018 keynote:  Srikanth Satya - "Stream Processin...Flink Forward San Francisco 2018 keynote:  Srikanth Satya - "Stream Processin...
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...
 
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
 
Make streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQLMake streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQL
 
Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...
Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...
Flink Forward Berlin 2018: Oleksandr Nitavskyi - "Data lossless event time st...
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
 
apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...
apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...
apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...
 
Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...
Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...
Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...
 
Stream Analytics with SQL on Apache Flink
 Stream Analytics with SQL on Apache Flink Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache Flink
 
dA Platform Overview
dA Platform OverviewdA Platform Overview
dA Platform Overview
 
Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...
Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...
Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...
 
The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®The Past, Present, and Future of Apache Flink®
The Past, Present, and Future of Apache Flink®
 
Apache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondApache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyond
 
Scaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache FlinkScaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache Flink
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®
 
The Past, Present, and Future of Apache Flink
The Past, Present, and Future of Apache FlinkThe Past, Present, and Future of Apache Flink
The Past, Present, and Future of Apache Flink
 
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
 
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
 

Similar a Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how to leverage the simplicity and power of SQL on Flink"

Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkDataWorks Summit
 
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"Flink Forward
 
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...Fabian Hueske
 
Webinar: Flink SQL in Action - Fabian Hueske
 Webinar: Flink SQL in Action - Fabian Hueske Webinar: Flink SQL in Action - Fabian Hueske
Webinar: Flink SQL in Action - Fabian HueskeVerverica
 
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...Flink Forward
 
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHINGBig Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHINGMatt Stubbs
 
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scaleDataScienceConferenc1
 
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview PresentationMicrosoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview PresentationMicrosoft Private Cloud
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon RedshiftAmazon Web Services
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftAmazon Web Services
 
Loading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFLoading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFAmazon Web Services
 
The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...confluent
 
Kafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appKafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appNeil Avery
 
Streaming sql w kafka and flink
Streaming sql w  kafka and flinkStreaming sql w  kafka and flink
Streaming sql w kafka and flinkKenny Gorman
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebookAniket Mokashi
 
Technical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfTechnical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfIlham31574
 

Similar a Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how to leverage the simplicity and power of SQL on Flink" (20)

Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on Flink
 
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
 
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...
 
Webinar: Flink SQL in Action - Fabian Hueske
 Webinar: Flink SQL in Action - Fabian Hueske Webinar: Flink SQL in Action - Fabian Hueske
Webinar: Flink SQL in Action - Fabian Hueske
 
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
Flink Forward Berlin 2018: Stephan Ewen - Keynote: "Unlocking the next wave o...
 
Sukhwant resume
Sukhwant resumeSukhwant resume
Sukhwant resume
 
Sukhwant resume
Sukhwant resumeSukhwant resume
Sukhwant resume
 
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHINGBig Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING
 
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
 
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview PresentationMicrosoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview Presentation
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF Loft
 
Loading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFLoading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SF
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...
 
Kafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appKafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming app
 
Streaming sql w kafka and flink
Streaming sql w  kafka and flinkStreaming sql w  kafka and flink
Streaming sql w kafka and flink
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebook
 
Technical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfTechnical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdf
 

Más de Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkFlink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 

Más de Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 

Último

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Último (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how to leverage the simplicity and power of SQL on Flink"

  • 1. - FABIAN HUESKE, SOFTWARE ENGINEER - TIMO WALTHER, SOFTWARE ENGINEER WHY AND HOW TO LEVERAGE THE POWER AND SIMPLICITY OF SQL ON APACHE FLINK®
  • 2. © 2018 data Artisans2 ABOUT DATA ARTISANS Original creators of Apache Flink® Open Source Apache Flink + dA Application Manager
  • 3. © 2018 data Artisans3 DA PLATFORM data-artisans.com/download
  • 4. © 2018 data Artisans4 POWERED BY APACHE FLINK
  • 5. © 2018 data Artisans5 FLINK’S POWERFUL ABSTRACTIONS Process Function (events, state, time) DataStream API (streams, windows) SQL / Table API (dynamic tables) Stream- & Batch Data Processing High-level Analytics API Stateful Event- Driven Applications val stats = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) .sum((a, b) -> a.add(b)) def processElement(event: MyEvent, ctx: Context, out: Collector[Result]) = { // work with event and state (event, state.value) match { … } out.collect(…) // emit events state.update(…) // modify state // schedule a timer callback ctx.timerService.registerEventTimeTimer(event.timestamp + 500) } Layered abstractions to navigate simple to complex use cases
  • 6. © 2018 data Artisans6 APACHE FLINK’S RELATIONAL APIS Unified APIs for batch & streaming data A query specifies exactly the same result regardless whether its input is static batch data or streaming data. tableEnvironment .scan("clicks") .groupBy('user) .select('user, 'url.count as 'cnt) SELECT user, COUNT(url) AS cnt FROM clicks GROUP BY user LINQ-style Table APIANSI SQL
  • 7. © 2018 data Artisans7 QUERY TRANSLATION tableEnvironment .scan("clicks") .groupBy('user) .select('user, 'url.count as 'cnt) SELECT user, COUNT(url) AS cnt FROM clicks GROUP BY user Input data is bounded (batch) Input data is unbounded (streaming)
  • 8. © 2018 data Artisans8 WHAT IF “CLICKS” IS A FILE? Clicks user cTime url Mary 12:00:00 https://… Bob 12:00:00 https://… Mary 12:00:02 https://… Liz 12:00:03 https://… user cnt Mary 2 Bob 1 Liz 1 SELECT user, COUNT(url) as cnt FROM clicks GROUP BY user Input data is read at once Result is produced at once
  • 9. © 2018 data Artisans9 WHAT IF “CLICKS” IS A STREAM? user cTime url user cnt SELECT user, COUNT(url) as cnt FROM clicks GROUP BY user Clicks Mary 12:00:00 https://… Bob 12:00:00 https://… Mary 12:00:02 https://… Liz 12:00:03 https://… Bob 1 Liz 1 Mary 1Mary 2 Input data is continuously read Result is continuously updated The result is the same!
  • 10. © 2018 data Artisans10 • Usability ‒ ANSI SQL syntax: No custom “StreamSQL” syntax. ‒ ANSI SQL semantics: No stream-specific results. • Portability ‒ Run the same query on bounded and unbounded data ‒ Run the same query on recorded and real-time data • How can we achieve SQL semantics on streams? now bounded query unbounded query past future bounded query start of the stream unbounded query WHY IS STREAM-BATCH UNIFICATION IMPORTANT?
  • 11. © 2018 data Artisans11 • Materialized views (MV) are similar to regular views, but persisted to disk or memory ‒Used to speed-up analytical queries ‒MVs need to be updated when the base tables change • MV maintenance is very similar to SQL on streams ‒Base table updates are a stream of DML statements ‒MV definition query is evaluated on that stream ‒MV is query result and continuously updated DATABASE SYSTEMS RUN QUERIES ON STREAMS
  • 12. © 2018 data Artisans12 CONTINUOUS QUERIES IN FLINK • Core concept is a “DynamicTable” ‒Dynamic tables are changing over time • Queries on dynamic tables ‒produce new dynamic tables (which are updated based on input) ‒do not terminate • Stream ↔ Dynamic table conversions 12
  • 13. © 2018 data Artisans13 STREAM ↔ DYNAMIC TABLE CONVERSIONS • Append Conversions ‒Records are only inserted/appended • Upsert Conversions ‒Records are inserted/updated/deleted ‒Records have a (composite) unique key • ChangelogConversions ‒Records are inserted/updated/deleted
  • 14. © 2018 data Artisans14 SQL FEATURE SET IN FLINK 1.5.0 • SELECT FROMWHERE • GROUP BY / HAVING ‒ Non-windowed,TUMBLE, HOP, SESSION windows • JOIN ‒ Windowed INNER, LEFT / RIGHT / FULL OUTER JOIN ‒ Non-windowed INNER JOIN • Scalar, aggregation, table-valued UDFs • SQL CLI Client (beta) • [streaming only] OVER /WINDOW ‒ UNBOUNDED / BOUNDED PRECEDING • [batch only] UNION / INTERSECT / EXCEPT / IN / ORDER BY
  • 15. © 2018 data Artisans15 WHAT CAN I BUILD WITH THIS? • Data Pipelines ‒ Transform, aggregate, and move events in real-time • Low-latency ETL ‒ Convert and write streams to file systems, DBMS, K-V stores, indexes, … ‒ Ingest appearing files to produce streams • Stream & Batch Analytics ‒ Run analytical queries over bounded and unbounded data ‒ Query and compare historic and real-time data • Power Live Dashboards ‒ Compute and update data to visualize in real-time
  • 16. © 2018 data Artisans16 THE NEW YORK TAXI RIDES DATA SET • The NewYork CityTaxi & Limousine Commission provides a public data set about past taxi rides in NewYork City • We can derive a streaming table from the data • Table: TaxiRides rideId: BIGINT // ID of the taxi ride isStart: BOOLEAN // flag for pick-up (true) or drop-off (false) event lon: DOUBLE // longitude of pick-up or drop-off location lat: DOUBLE // latitude of pick-up or drop-off location rowtime: TIMESTAMP // time of pick-up or drop-off event
  • 17. © 2018 data Artisans17 SELECT cell, isStart, HOP_END(rowtime, INTERVAL '5' MINUTE, INTERVAL '15' MINUTE) AS hopEnd, COUNT(*) AS cnt FROM (SELECT rowtime, isStart, toCellId(lon, lat) AS cell FROM TaxiRides) GROUP BY cell, isStart, HOP(rowtime, INTERVAL '5' MINUTE, INTERVAL '15' MINUTE)  Compute every 5 minutes for each location the number of departing and arriving taxis of the last 15 minutes. IDENTIFY POPULAR PICK-UP / DROP-OFF LOCATIONS
  • 18. © 2018 data Artisans18 SELECT pickUpCell, AVG(TIMESTAMPDIFF(MINUTE, e.rowtime, s.rowtime) AS avgDuration FROM (SELECT rideId, rowtime, toCellId(lon, lat) AS pickUpCell FROM TaxiRides WHERE isStart) s JOIN (SELECT rideId, rowtime FROM TaxiRides WHERE NOT isStart) e ON s.rideId = e.rideId AND e.rowtime BETWEEN s.rowtime AND s.rowtime + INTERVAL '1' HOUR GROUP BY pickUpCell  Join start ride and end ride events on rideId and compute average ride duration per pick-up location. AVERAGE RIDE DURATION PER PICK-UP LOCATION
  • 19. © 2018 data Artisans19 BUILDING A DASHBOARD Elastic Search Kafka SELECT cell, isStart, HOP_END(rowtime, INTERVAL '5' MINUTE, INTERVAL '15' MINUTE) AS hopEnd, COUNT(*) AS cnt FROM (SELECT rowtime, isStart, toCellId(lon, lat) AS cell FROM TaxiRides) GROUP BY cell, isStart, HOP(rowtime, INTERVAL '5' MINUTE, INTERVAL '15' MINUTE)
  • 20. © 2018 data Artisans20 SOUNDS GREAT! HOW CAN I USE IT? • ATM, SQL queries must be embedded in Java/Scala code  ‒ Tight integration with DataStream and DataSet APIs • Community focused on internals (until Flink 1.4.0) ‒ Operators, types, built-in functions, extensibility (UDFs, extern. catalog) ‒ Proven at scale by Alibaba, Huawei, and Uber ‒ All built their own submission system & connectors library • Community neglected user interfaces ‒ No query submission client, no CLI ‒ No integration with common catalog services ‒ Limited set ofTableSources andTableSinks
  • 21. © 2018 data Artisans21 COMING IN FLINK 1.5.0 - SQL CLI DemoTime! That’s a nice toy, but … ... can I use it for anything serious?
  • 22. © 2018 data Artisans22 FLIP-24 – A SQL QUERY SERVICE • REST service to submit & manage SQL queries ‒ SELECT … ‒ INSERT INTO SELECT … ‒ CREATE MATERIALIZE VIEW … • Serve results of “SELECT …” queries • Provide a table catalog (integrated with external catalogs) • Use cases ‒ Data exploration with notebooks like Apache Zeppelin ‒ Access to real-time data from applications ‒ Easy data routing / ETL from management consoles
  • 23. © 2018 data Artisans23 CHALLENGE: SERVE DYNAMIC TABLES Unbounded input yields unbounded results SELECT user, COUNT(url) AS cnt FROM clicks GROUP BY user SELECT user, url FROM clicks WHERE url LIKE '%xyz.com' Append-onlyTable • Result rows are never changed • Consume, buffer, or drop rows Continuously updatingTable • Result rows can be updated or deleted • Consume changelog or periodically query result table • Result table must be maintained somewhere (Serving bounded results is easy)
  • 24. © 2018 data Artisans24 Application FLIP-24 – A SQL QUERY SERVICE Query Service Catalog Optimizer Database / HDFS Event Log External Catalog (Schema Registry, HCatalog, …) Query Results Submit Query Job State REST Result Server Submit Query REST Database / HDFS Event Log SELECT user, COUNT(url) AS cnt FROM clicks GROUP BY user Results are served by Query Service via REST + Application does not need a special client + Works well in many network configurations − Query service can become bottleneck
  • 25. © 2018 data Artisans25 Application FLIP-24 – A SQL QUERY SERVICE Query Service SELECT user, COUNT(url) AS cnt FROM clicks GROUP BY user Catalog Optimizer Database / HDFS Event Log External Catalog (Schema Registry, HCatalog, …) Query Submit Query Job State REST Result Server Submit Query REST Database / HDFS Event Log Serving Library Result Handle
  • 26. © 2018 data Artisans26 WE WANT YOUR FEEDBACK! • The design of SQL Query Service is not final yet. • Check out FLIP-24 and FLINK-7594 • Share your ideas and feedback and discuss on JIRA or dev@flink.apache.org.
  • 27. © 2018 data Artisans27 SUMMARY • Unification of stream and batch is important. • Flink’s SQL solves many streaming and batch use cases. • Runs in production at Alibaba, Uber, and others. • The community is working on improving user interfaces. • Get involved, discuss, and contribute!
  • 28. THANK YOU! Available on O’Reilly Early Release!
  • 29. THANK YOU! @fhueske & @twalthr @dataArtisans @ApacheFlink WE ARE HIRING data-artisans.com/careers

Notas del editor

  1. • data Artisans was founded by the original creators of Apache Flink • We provide dA Platform, a complete stream processing infrastructure with open-source Apache Flink
  2. • Also included is the Application Manager, which turns dA Platform into a self-service platform for stateful stream processing applications. • dA Platform is generally available, and you can download a free trial today!
  3. • These companies are among many users of Apache Flink, and during this conference you’ll meet folks from some of these companies as well as others using Flink. • If your company would like to be represented on the “Powered by Apache Flink” page, email me.
  4. Flink offers APIs for different levels of abstraction that all can be mixed and matched. On the lowest level, ProcessFunctions give precise control about state and time, i.e., when to process data. The intermediate level, the so-called DataStream API provides higher-level primitives such as for window processing Finally, on the top level the relational API, SQL and the Table API, are centered around the concept of dynamic tables. This is what I’m going to talk about next.
  5. (Keep this slide up during the Q&A part of your talk. Having this up in the final 5-10 minutes of the session gives the audience something useful to look at.)