SlideShare a Scribd company logo
1 of 32
Samza SQL
Srinivasulu Punuru
Agenda
1 What is Samza SQL?
2 Why SQL on Samza?
3 How does it work?
4 Demo
5 Q&A
What is Samza SQL?
Samza SQL by Example
Count page views of each member in a five minute window.
Send the result to kafka topic PageViewCount.
Samza low level task API
Repartitioner Job
public class PageViewRepartitioner implements StreamTask {
SystemStream outputStream = new SystemStream("kafka", "pvMemberId");
@Override
public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) {
PageViewEvent pageViewEvent = (PageViewEvent) envelope.getMessage();
String key = pageViewEvent.getMemberId();
OutgoingMessageEnvelope outMessage = new OutgoingMessageEnvelope(outputStream, pageViewEvent, key, pageViewEvent);
collector.send(outMessage);
}
}
Samza low level task API (contd.)
Page view counter
job
public class PageViewCounter implements StreamTask {
SystemStream outputStream = new SystemStream("kafka", "pageviewCount");
private Instant lastTriggerTime = Instant.now();
private HashMap<String, Integer> counter = new HashMap<>();
@Override
public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) {
PageViewEvent pageViewEvent = (PageViewEvent) envelope.getMessage();
String memberId = pageViewEvent.getMemberId();
counter.put(memberId, counter.getOrDefault(memberId, 0) + 1);
if (Duration.between(lastTriggerTime, Instant.now()).toMinutes() > 5) {
counter.forEach((key, value) -> collector.send(new OutgoingMessageEnvelope(outputStream, key, value)));
counter.clear();
}
}
}
Samza high level API
public class PageViewCountApplication implements StreamApplication {
@Override public void init(StreamGraph graph, Config config) {
MessageStream<PageViewEvent> pageViewEvents = graph.getInputStream("pageView" );
MessageStream pageViewCount = graph.getOutputStream("pageViewCount" );
pageView
.partitionBy(m -> m.memberId)
.window(Windows.keyedTumblingWindow(m -> m.memberId, Duration.ofMinutes(5),
initialValue, (m, c) -> c + 1))
.map(MyStreamOutput::new)
.sendTo(pageViewPerMember);
}
}
Samza SQL
INSERT INTO kafka.pageViewCount
SELECT memberId, count(*) FROM kafka.pageViewStream
GROUP BY memberId, TUMBLE(current_timestamp, INTERVAL '5' MINUTES)
Samza API stack
User can choose the API to
write a Samza job.
Why SQL on Samza
• Expand the target audience of stream processing.
• Obtain quick real time insights.
• Create stream processing applications quickly.
How does it work?
How do we execute below SQL on Samza?
INSERT INTO kafka.NewEmployees
SELECT firstName, lastName FROM kafka.profileUpdateStream
WHERE profile.newCompany = ‘LinkedIn’
High level architecture
Samza SQL to Calcite relational algebra
INSERT INTO kafka.NewLinkedInEmployees
SELECT firstName, lastName FROM kafka.profileChange
WHERE profile.newCompany = ‘LinkedIn’
LogicalTableModify
LogicalProject
LogicalFilter
LogicalTableScan
Samza operator graph conversion
LogicalTableModify
LogicalProject
LogicalFilter
LogicalTableScan
profileChange
.filter(p -> p.getNewCompany().equals("LinkedIn"))
.map(this::getFirstAndLastName)
.sendTo(newLinkedInEmployees);
Samza SQL message flow
Samza SQL message flow
Samza SQL rel message format
public class SamzaSqlRelMessage {
private final List<Object> relFieldValues = new ArrayList<>();
private final List<String> relFieldNames = new ArrayList<>();
public List<String> getRelFieldNames() {
return relFieldNames;
}
public List<Object> getRelFieldValues() {
return this.relFieldValues;
}
}
• Simple relational format that represents a row in a table
• Ordered list of named values
Pluggable input/output resolvers
INSERT INTO kafka.NewEmployees
SELECT firstName, lastName FROM kafka.profileUpdateStream
WHERE profile.newCompany = ‘LinkedIn’
Samza SQL architecture
Demo
Demo setup
How do you use it?
• Samza SQL is available in Samza 0.14 release.
• Tutorial – http://bit.ly/samzasql
Samza– 0.14
• Samza SQL
• Projection, Filtering, UDFs, Flatten, Union, Avro
• Apache Beam runner for Samza
• Azure EventHub support
• Amazon kinesis support
• Multi stage batch support
• High level API improvements
• Durable state
• Programmable SerDe
Samza SQL- Future
• Joins (Stream-Stream & Stream-Table)
• Aggregates & aggregate UDF
• Full Subquery support
• Samza SQL as a service
Samza SQL- Future
• Joins (Stream-Stream & Stream-Table)
• Aggregates & aggregate UDF
• Full Subquery support
• Samza SQL as a service
Questions?
Thank you
Samza operator graph conversion
LogicalTableModify
LogicalProject
LogicalFilter
LogicalTableScan
Pluggable schema and message converters

More Related Content

What's hot

Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward
 
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...
confluent
 

What's hot (20)

KSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for KafkaKSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for Kafka
 
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
Flink Forward Berlin 2017: Piotr Wawrzyniak - Extending Apache Flink stream p...
 
High Available Task Scheduling Design using Kafka and Kafka Streams | Naveen ...
High Available Task Scheduling Design using Kafka and Kafka Streams | Naveen ...High Available Task Scheduling Design using Kafka and Kafka Streams | Naveen ...
High Available Task Scheduling Design using Kafka and Kafka Streams | Naveen ...
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNApache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
 
Going Reactive with Spring 5
Going Reactive with Spring 5Going Reactive with Spring 5
Going Reactive with Spring 5
 
Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud"
Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud" Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud"
Flink Forward San Francisco 2018: Steven Wu - "Scaling Flink in Cloud"
 
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
Flink Forward San Francisco 2018: Stefan Richter - "How to build a modern str...
 
Diving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka ConnectDiving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka Connect
 
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processing
 
Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19
 
How to manage large amounts of data with akka streams
How to manage large amounts of data with akka streamsHow to manage large amounts of data with akka streams
How to manage large amounts of data with akka streams
 
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
 
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
 
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
 
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
Pulsar in the Lakehouse: Overview of Apache Pulsar and Delta Lake Connector -...
 
Actors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesActors or Not: Async Event Architectures
Actors or Not: Async Event Architectures
 

Similar to Stream Processing using Samza SQL

10 performance and scalability secrets of ASP.NET websites
10 performance and scalability secrets of ASP.NET websites10 performance and scalability secrets of ASP.NET websites
10 performance and scalability secrets of ASP.NET websites
oazabir
 
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMwareEvent Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
HostedbyConfluent
 

Similar to Stream Processing using Samza SQL (20)

SamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentation
 
Apache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's NextApache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's Next
 
Nextcon samza preso july - final
Nextcon samza preso   july - finalNextcon samza preso   july - final
Nextcon samza preso july - final
 
Data Microservices In The Cloud + 日本語コメント
Data Microservices In The Cloud + 日本語コメントData Microservices In The Cloud + 日本語コメント
Data Microservices In The Cloud + 日本語コメント
 
SynapseIndia dotnet development ajax client library
SynapseIndia dotnet development ajax client librarySynapseIndia dotnet development ajax client library
SynapseIndia dotnet development ajax client library
 
Scaling asp.net websites to millions of users
Scaling asp.net websites to millions of usersScaling asp.net websites to millions of users
Scaling asp.net websites to millions of users
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
 
Full Stack Scala
Full Stack ScalaFull Stack Scala
Full Stack Scala
 
Event streaming webinar feb 2020
Event streaming webinar feb 2020Event streaming webinar feb 2020
Event streaming webinar feb 2020
 
Integrating SAP the Java EE Way - JBoss One Day talk 2012
Integrating SAP the Java EE Way - JBoss One Day talk 2012Integrating SAP the Java EE Way - JBoss One Day talk 2012
Integrating SAP the Java EE Way - JBoss One Day talk 2012
 
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLBuilding a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
 
Spring Web MVC
Spring Web MVCSpring Web MVC
Spring Web MVC
 
10 performance and scalability secrets of ASP.NET websites
10 performance and scalability secrets of ASP.NET websites10 performance and scalability secrets of ASP.NET websites
10 performance and scalability secrets of ASP.NET websites
 
Multi Client Development with Spring
Multi Client Development with SpringMulti Client Development with Spring
Multi Client Development with Spring
 
Serverless Apps with AWS Step Functions
Serverless Apps with AWS Step FunctionsServerless Apps with AWS Step Functions
Serverless Apps with AWS Step Functions
 
SMC304 Serverless Orchestration with AWS Step Functions
SMC304 Serverless Orchestration with AWS Step FunctionsSMC304 Serverless Orchestration with AWS Step Functions
SMC304 Serverless Orchestration with AWS Step Functions
 
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMwareEvent Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
 
Spring MVC introduction HVA
Spring MVC introduction HVASpring MVC introduction HVA
Spring MVC introduction HVA
 
[JEEConf-2017] RxJava as a key component in mature Big Data product
[JEEConf-2017] RxJava as a key component in mature Big Data product[JEEConf-2017] RxJava as a key component in mature Big Data product
[JEEConf-2017] RxJava as a key component in mature Big Data product
 
Jsf intro
Jsf introJsf intro
Jsf intro
 

Recently uploaded

+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
Health
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Bridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptxBridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptx
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
Air Compressor reciprocating single stage
Air Compressor reciprocating single stageAir Compressor reciprocating single stage
Air Compressor reciprocating single stage
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 

Stream Processing using Samza SQL

  • 2. Agenda 1 What is Samza SQL? 2 Why SQL on Samza? 3 How does it work? 4 Demo 5 Q&A
  • 3.
  • 5. Samza SQL by Example Count page views of each member in a five minute window. Send the result to kafka topic PageViewCount.
  • 6. Samza low level task API Repartitioner Job public class PageViewRepartitioner implements StreamTask { SystemStream outputStream = new SystemStream("kafka", "pvMemberId"); @Override public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) { PageViewEvent pageViewEvent = (PageViewEvent) envelope.getMessage(); String key = pageViewEvent.getMemberId(); OutgoingMessageEnvelope outMessage = new OutgoingMessageEnvelope(outputStream, pageViewEvent, key, pageViewEvent); collector.send(outMessage); } }
  • 7. Samza low level task API (contd.) Page view counter job public class PageViewCounter implements StreamTask { SystemStream outputStream = new SystemStream("kafka", "pageviewCount"); private Instant lastTriggerTime = Instant.now(); private HashMap<String, Integer> counter = new HashMap<>(); @Override public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) { PageViewEvent pageViewEvent = (PageViewEvent) envelope.getMessage(); String memberId = pageViewEvent.getMemberId(); counter.put(memberId, counter.getOrDefault(memberId, 0) + 1); if (Duration.between(lastTriggerTime, Instant.now()).toMinutes() > 5) { counter.forEach((key, value) -> collector.send(new OutgoingMessageEnvelope(outputStream, key, value))); counter.clear(); } } }
  • 8. Samza high level API public class PageViewCountApplication implements StreamApplication { @Override public void init(StreamGraph graph, Config config) { MessageStream<PageViewEvent> pageViewEvents = graph.getInputStream("pageView" ); MessageStream pageViewCount = graph.getOutputStream("pageViewCount" ); pageView .partitionBy(m -> m.memberId) .window(Windows.keyedTumblingWindow(m -> m.memberId, Duration.ofMinutes(5), initialValue, (m, c) -> c + 1)) .map(MyStreamOutput::new) .sendTo(pageViewPerMember); } }
  • 9. Samza SQL INSERT INTO kafka.pageViewCount SELECT memberId, count(*) FROM kafka.pageViewStream GROUP BY memberId, TUMBLE(current_timestamp, INTERVAL '5' MINUTES)
  • 10. Samza API stack User can choose the API to write a Samza job.
  • 11. Why SQL on Samza • Expand the target audience of stream processing. • Obtain quick real time insights. • Create stream processing applications quickly.
  • 12. How does it work?
  • 13. How do we execute below SQL on Samza? INSERT INTO kafka.NewEmployees SELECT firstName, lastName FROM kafka.profileUpdateStream WHERE profile.newCompany = ‘LinkedIn’
  • 15. Samza SQL to Calcite relational algebra INSERT INTO kafka.NewLinkedInEmployees SELECT firstName, lastName FROM kafka.profileChange WHERE profile.newCompany = ‘LinkedIn’ LogicalTableModify LogicalProject LogicalFilter LogicalTableScan
  • 16. Samza operator graph conversion LogicalTableModify LogicalProject LogicalFilter LogicalTableScan profileChange .filter(p -> p.getNewCompany().equals("LinkedIn")) .map(this::getFirstAndLastName) .sendTo(newLinkedInEmployees);
  • 19. Samza SQL rel message format public class SamzaSqlRelMessage { private final List<Object> relFieldValues = new ArrayList<>(); private final List<String> relFieldNames = new ArrayList<>(); public List<String> getRelFieldNames() { return relFieldNames; } public List<Object> getRelFieldValues() { return this.relFieldValues; } } • Simple relational format that represents a row in a table • Ordered list of named values
  • 20. Pluggable input/output resolvers INSERT INTO kafka.NewEmployees SELECT firstName, lastName FROM kafka.profileUpdateStream WHERE profile.newCompany = ‘LinkedIn’
  • 22. Demo
  • 24. How do you use it? • Samza SQL is available in Samza 0.14 release. • Tutorial – http://bit.ly/samzasql
  • 25. Samza– 0.14 • Samza SQL • Projection, Filtering, UDFs, Flatten, Union, Avro • Apache Beam runner for Samza • Azure EventHub support • Amazon kinesis support • Multi stage batch support • High level API improvements • Durable state • Programmable SerDe
  • 26. Samza SQL- Future • Joins (Stream-Stream & Stream-Table) • Aggregates & aggregate UDF • Full Subquery support • Samza SQL as a service
  • 27. Samza SQL- Future • Joins (Stream-Stream & Stream-Table) • Aggregates & aggregate UDF • Full Subquery support • Samza SQL as a service
  • 29.
  • 31. Samza operator graph conversion LogicalTableModify LogicalProject LogicalFilter LogicalTableScan
  • 32. Pluggable schema and message converters