Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Flink Forward San Francisco 2019: How to Join Two Data Streams? - Piotr Nowojski

108 visualizaciones

Publicado el

Joins are one of the most common operations in SQL. However it is far from trivial how to express and execute them in Streaming environment with continuously running queries.During this talk we will first look into why Join operations are more difficult on infinite data streams. Next we will check couple of different approaches to tackle this problem like Time Windowed Joins or the recent addition to Flink SQL: Temporal Joins. Temporal Tables and Temporal Joins are new concepts that provide an efficient solution to a common problem of for example data enrichment. Before Flink 1.7 data enrichment in SQL was often impossible to express using Windowed Joins or very inefficient when using Regular Joins. With Temporal Joins Flink provide an interesting and ANSI SQL complaint alternative way how to join two data streams.

Publicado en: Tecnología
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

Flink Forward San Francisco 2019: How to Join Two Data Streams? - Piotr Nowojski

  1. 1. © 2019 Ververica Piotr Nowojski How to Join Two Data Streams?
  2. 2. © 2019 Ververica2 About Ververica Original creators of Apache Flink® Complete Stream Processing Infrastructure
  3. 3. © 2019 Ververica3 Agenda • Joins in Batch SQL • Why Streaming is different? • Time-windowed joins • Temporal Table joins
  4. 4. © 2019 Ververica Joins in Batch SQL
  5. 5. © 2019 Ververica5 Joins in Batch SQL SELECT a.id FROM A a, B b WHERE a.id = b.id
  6. 6. © 2019 Ververica6 SELECT a.id FROM A a, B b WHERE a.id = b.id Table A 1 42 2 3 6 Table B 42 7 3 1 Result 1 42 3
  7. 7. © 2019 Ververica7 Joins in Batch SQL • Two traditional approaches –Sort-Merge Join –Hash Join
  8. 8. © 2019 Ververica Joins in Streaming SQL
  9. 9. © 2019 Ververica9 Join in continuous queries Table A ∅ ... Table B 42 ... Result ∅ ...
  10. 10. © 2019 Ververica10 Join in continuous queries Table A 1 ... Table B 42 ... Result ∅ ...
  11. 11. © 2019 Ververica11 Join in continuous queries Table A 1 42 ... Table B 42 ... Result 42 ...
  12. 12. © 2019 Ververica12 Join in continuous queries Table A 1 42 ... Table B 42 7 ... Result 42 ...
  13. 13. © 2019 Ververica13 Join in continuous queries Table A 1 42 ... Table B 42 7 3 ... Result 42 ...
  14. 14. © 2019 Ververica14 Join in continuous queries Table A 1 42 ... Table B 42 7 3 1 ... Result 42 1 ...
  15. 15. © 2019 Ververica15 Join in continuous queries Table A 1 42 2 3 6 ... Table B 42 7 3 1 ... Result 42 1 3 ...
  16. 16. © 2019 Ververica16 Challenges in Streaming SQL • Continuous queries • Unbounded inputs • Traditional algorithms do not work well –Sort-Merge Join is infeasible –Hash Join resource usage
  17. 17. © 2019 Ververica Time-windowed Join
  18. 18. © 2019 Ververica18 Watermarks Stream (out of order) 21 19 22 1220 17 14 15 1112 9 7 Watermark 17 (the most recent elements are on the left side) Watermark 11 Event’s timestamp Event
  19. 19. © 2019 Ververica19 Time-windowed Join SELECT * FROM Orders o, Shipments s WHERE o.id = s.orderId AND s.shiptime BETWEEN o.ordertime AND o.ordertime + INTERVAL '4' HOUR
  20. 20. © 2019 Ververica20 Time-windowed Join
  21. 21. © 2019 Ververica21 Time-windowed Join
  22. 22. © 2019 Ververica Temporal Table Join
  23. 23. © 2019 Ververica23 RatesHistory 示例 time currency rate 09:00 USD 102 09:00 Euro 114 09:00 Yen 1 10:45 Euro 116 11:15 Euro 119 11:49 USD 99 ... ... ... Orders time currency amount 10:15 Euro 2 11:00 Yen 50 11:35 Euro 2 ... ... ...
  24. 24. © 2019 Ververica24 RatesHistory time currency rate 09:00 USD 102 09:00 Euro 114 09:00 Yen 1 10:45 Euro 116 11:15 Euro 119 11:49 USD 99 ... ... ... TemporalTableFunction rates = ratesHistory .createTemporalTableFunction( "time", // <- “versioning” field "currency"); // <- primary key tableEnv.registerFunction("Rates", rates);
  25. 25. © 2019 Ververica25 RatesHistory 示例 time currency rate 09:00 USD 102 09:00 Euro 114 09:00 Yen 1 10:45 Euro 116 11:15 Euro 119 11:49 USD 99 ... ... ... SELECT * FROM Rates('10:15'); time currency rate 09:00 USD 102 09:00 Euro 114 09:00 Yen 1
  26. 26. © 2019 Ververica26 RatesHistory 示例 time currency rate 09:00 USD 102 09:00 Euro 114 09:00 Yen 1 10:45 Euro 116 11:15 Euro 119 11:49 USD 99 ... ... ... SELECT * FROM Rates('11:50’); time currency rate 11:49 USD 99 11:15 Euro 119 09:00 Yen 1
  27. 27. © 2019 Ververica27 SELECT o.amount * r.rate FROM Orders o, LATERAL TABLE (Rates(o.time)) r WHERE o.currency = r.currency 内容需为双语 Temporal Table Join 示例
  28. 28. © 2019 Ververica28 RatesHistory 示例 time currency rate 09:00 USD 102 09:00 Euro 114 09:00 Yen 1 10:45 Euro 116 11:15 Euro 119 11:49 USD 99 ... ... ... Orders time currency amount 10:15 Euro 2 ... ... ... Result rate * amount 228 ... SELECT o.amount * r.rate FROM Orders o, LATERAL TABLE (Rates(o.time)) r WHERE o.currency = r.currency
  29. 29. © 2019 Ververica29 RatesHistory 示例 time currency rate 09:00 USD 102 09:00 Euro 114 09:00 Yen 1 10:45 Euro 116 11:15 Euro 119 11:49 USD 99 ... ... ... Orders time currency amount 10:15 Euro 2 11:00 Yen 50 ... ... ... Result rate * amount 228 50 ...
  30. 30. © 2019 Ververica30 RatesHistory 示例 time currency rate 09:00 USD 102 09:00 Euro 114 09:00 Yen 1 10:45 Euro 116 11:15 Euro 119 11:49 USD 99 ... ... ... Orders time currency amount 10:15 Euro 2 10:00 Yen 50 11:35 Euro 2 ... ... ... Result rate * amount 228 50 238 ...
  31. 31. © 2019 Ververica31 Scope Memory Syntax SELECT * FROM A a, B b WHERE a.id = b.id All rows from both tables visible to each other All records must be persisted indefinitely Regular joins
  32. 32. © 2019 Ververica32 Scope Memory Syntax SELECT * FROM A a, B b WHERE a.id = b.id AND a.time BETWEEN b.time AND b.time + 1 `DAY` Rows visible within a defined time window All records between now and window length (plus watermark delay) Time-windowed Joins
  33. 33. © 2019 Ververica33 Scope Memory Syntax SELECT * FROM A a, LATERAL TABLE (B(a.time)) WHERE a.id = b.id Visible is only the latest version of B for given a.time Table A: all records between now and watermark delay Table B: all versions of B between now and watermark delay Temporal Table Joins
  34. 34. © 2019 Ververica Thank you!

×