SlideShare una empresa de Scribd logo
1 de 23
Descargar para leer sin conexión
Machine Learning on
Streaming Data
with Apache Kafka, Apache Beam, & TensorFlow
About Us
Mikhail Chrestkha
Machine Learning Specialist
Google Cloud
linkedin.com/in/mchrestkha
Stéphane Maarek
CEO & Kafka Instructor
DataCumulus
linkedin.com/in/stephanemaarek
Big Thanks to:
Julianne Cuneo
Big Data Specialist, Google Cloud
Kai Waehner
Technology Evangelist, Confluent
Agenda
1. Motivation
2. Architecture
3. Use Case Walk-Through w/ Demo
4. Summary
1 Motivation
Technology Landscape
Smart
Analytics
Streaming
InfoWorld’s 2019 Technology of the
Year Award Winners:
● Apache Beam
● Apache Kafka
● Elastic Stack
● DataStax Enterprise
● Firebase
● Horovod
● H2O Driverless AI
● Keras
● Kubernetes
● LLVM
● .Net Core
● PyTorch
● Redis
● TensorFlow
● Visual Studio Code
● XGBoost
Cloud
?
https://www.globenewswire.com/news-release/2019/01/30/1707685/0/en/InfoWorld-Announces-2019-Technology-of-the-Year-Award-Winners.html
Data Ingestion
Data Analysis &
Transformation
Trainer
Model Evaluation
& Validation
Serving
Notebook
Orchestration
ML Framework
ML Platform
OSS Managed Service
Apache Kafka
Event streaming platform
Confluent Cloud
Monitoring, Replication, Data Balancing
Apache Beam
Data processing pipelines
Unified batch & streaming
Dataflow
Automated resource management of workers
TensorFlow
Robust foundation for machine and
deep learning
Cloud Machine Learning Engine
● Training: Distributed training infrastructure that supports
CPUs, GPUs, and TPUs
● Serving: Host models for batch & online prediction
2 Architecture
Reference Kafka ML Architecture
● Data pipelines are simplified
● Building analytic modules is decoupled
from servicing them
● Usage of real time or batch as needed
● Analytic models can be deployed in a
performant, scalable and
mission-critical environment
Kai Waehner
Technology Evangelist, Confluent
https://www.confluent.io/blog/build-deploy-scalable-machine-learning-production-apache-kafka/
Confluent Cloud
Managed by Confluent Analytics, ML training & deployment path
ML serving path
Data warehouse
BigQuery
ML Training
Cloud ML Engine
Topic 1
Raw transaction
Topic 2
Predictions
Kafka
Cluster
Processing
Cloud Dataflow
Leverage managed services to simplify & focus on code not infrastructure
Producer
Consumer
Consumer
ML Notebook
KSQL
SQL Submit ML
Training jobs
ML Serving
Cloud ML Engine
ML notebook development / experimentation
Deploy
ML model
Automate w/
AirFlow
Dataflow
Template
3 Use Case Walk-Through
Kaggle Case Study
Fraud Detection of Credit Card Transactions
● Collect transaction data
● Analyze historical data
● Train model on historic sample
● Evaluate model based on precision & recall
● Predict fraud on new streaming data
492
Fraud (0.172%)
284,807
transactions
● Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson and Gianluca Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In
Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015
● Dal Pozzolo, Andrea; Caelen, Olivier; Le Borgne, Yann-Ael; Waterschoot, Serge; Bontempi, Gianluca. Learned lessons in credit card fraud detection from a
practitioner perspective, Expert systems with applications,41,10,4915-4928,2014, Pergamon
● Dal Pozzolo, Andrea; Boracchi, Giacomo; Caelen, Olivier; Alippi, Cesare; Bontempi, Gianluca. Credit card fraud detection: a realistic modeling and a novel learning
strategy, IEEE transactions on neural networks and learning systems,29,8,3784-3797,2018,IEEE
○ Dal Pozzolo, Andrea Adaptive Machine learning for credit card fraud detection ULB MLG PhD thesis (supervised by G. Bontempi)
● Carcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-Aël; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming
credit card fraud detection with Spark, Information fusion,41, 182-194,2018,Elsevier
● Carcillo, Fabrizio; Le Borgne, Yann-Aël; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection:
assessment and visualization, International Journal of Data Science and Analytics, 5,4,285-300,2018,Springer International Publishing
https://opendatacommons.org/licenses/dbcl/1.0/
DEMO 1 - 5 min
Sending our credit card data
Confluent Cloud, Creating a Topic, Python Script, Security
Kafka to BigQuery
Dataflow Template
Java Code
KafkaIO.<String, String>read() BigQueryIO.writeTableRows()
Create a template for easy re-usability by an analyst
Images from https://beam.apache.org/documentation/pipelines/design-your-pipeline/
redacted
Explore data & train ML model
from ksql import KSQLAPI
redacted
%%bigquery
redacted
gcloud ml-engine jobs
submit training
redacted
Query directly from topic Query petabytes of data Submit ML training job
DEMO 2 - 5 min
Dataflow template & job
Jupyter: KSQL, BQML, TensorFlow CMLE job
Send Predictions back to Kafka
Java Code
Cloud Machine Learning
Engine
Request Response
Hosted ML Model
Image from https://beam.apache.org/documentation/pipelines/design-your-pipeline/
Publish models
KafkaIO.<String, String>read() KafkaIO.<String, String>write()
Train
Model
DEMO 3 - 5 min
(1) Deploy model as an end point
(2) Prediction sent to Kafka topic to be consumed
(3) Track models & monitor predictions in CMLE UI
Futuristic Architecture: Pure Kafka-based ML
Resilient, highly available, sync & async
Confluent Cloud
Managed by Confluent
Topic 1
Raw transaction
Topic 2
Predictions
Producer
Consumer
Consumer
Kafka Streams ML
Synchronous
Application
training
serving
Interactive
Query API
gRPC or
REST API
Internal Topic
ML Model
(compacted?)
Model
state
4 Summary
Summary
● Kafka + Beam + TensorFlow = Great foundation
for future
○ Batch today → streaming tomorrow
○ Small data → big data tomorrow
○ Shallow learning today → deep learning tomorrow
● Make data & ML easier for yourself by using
managed services
● Build for many other use cases:
○ Predictive maintenance
○ Logistics routing
○ Image search & recommendations in e-commerce
Smart
AnalyticsStreaming
Cloud
Talk to Google Cloud
K1Booth
Learn More
Blog: Enabling connected
transformation with Apache Kafka
and TensorFlow on Google Cloud
Platform
bit.ly/2CHERol
KafkaIO on Beam
bit.ly/2YwL3Jc
KafkaToBigQuery Dataflow
Template Example
bit.ly/2HQqVN0
Contact us
linkedin.com/in/mchrestkha
linkedin.com/in/stephanemaarek
Confluent Cloud
Managed by Confluent Analytics, ML training & deployment path
ML serving path
Data warehouse
BigQuery
ML Training
Cloud ML Engine
Topic 1
Raw transaction
Topic 2
Predictions
Kafka
Cluster
Processing
Cloud Dataflow
Questions
Producer
Consumer
Consumer
Cloud ML Notebook
KSQL
SQL Submit ML
Training jobs
ML Serving
Cloud ML Engine
ML notebook development / experimentation
Deploy
ML model
Automate w/
AirFlow
Dataflow
Template

Más contenido relacionado

La actualidad más candente

Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataModel serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataGetInData
 
Sentiment analysis of Twitter Data
Sentiment analysis of Twitter DataSentiment analysis of Twitter Data
Sentiment analysis of Twitter DataNurendra Choudhary
 
Kafka Summit SF 2017 - Best Practices for Running Kafka on Docker Containers
Kafka Summit SF 2017 - Best Practices for Running Kafka on Docker ContainersKafka Summit SF 2017 - Best Practices for Running Kafka on Docker Containers
Kafka Summit SF 2017 - Best Practices for Running Kafka on Docker Containersconfluent
 
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh BhatnagarScalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh BhatnagarDatabricks
 
[124] mit cheetah 로봇의 탄생
[124] mit cheetah 로봇의 탄생[124] mit cheetah 로봇의 탄생
[124] mit cheetah 로봇의 탄생NAVER D2
 
Swagger With REST APIs.pptx.pdf
Swagger With REST APIs.pptx.pdfSwagger With REST APIs.pptx.pdf
Swagger With REST APIs.pptx.pdfKnoldus Inc.
 
Apache Superset - open source data exploration and visualization (Conclusion ...
Apache Superset - open source data exploration and visualization (Conclusion ...Apache Superset - open source data exploration and visualization (Conclusion ...
Apache Superset - open source data exploration and visualization (Conclusion ...Lucas Jellema
 
Selenium test automation
Selenium test automationSelenium test automation
Selenium test automationSrikanth Vuriti
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Databricks
 
stackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with Weaviatestackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with WeaviateNETWAYS
 
Understanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage EngineUnderstanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage EngineInfluxData
 
Metaverse - The Future of Internet
Metaverse - The Future of InternetMetaverse - The Future of Internet
Metaverse - The Future of Internetrajdave38
 
Introduction into Icinga
Introduction into IcingaIntroduction into Icinga
Introduction into IcingaIcinga
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflowDatabricks
 
A story of Netflix and AB Testing in the User Interface using DynamoDB - DAT3...
A story of Netflix and AB Testing in the User Interface using DynamoDB - DAT3...A story of Netflix and AB Testing in the User Interface using DynamoDB - DAT3...
A story of Netflix and AB Testing in the User Interface using DynamoDB - DAT3...Amazon Web Services
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkSlim Baltagi
 

La actualidad más candente (20)

Telerik
TelerikTelerik
Telerik
 
Katalon Studio Presentation.pptx
Katalon Studio Presentation.pptxKatalon Studio Presentation.pptx
Katalon Studio Presentation.pptx
 
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataModel serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
 
Sentiment analysis of Twitter Data
Sentiment analysis of Twitter DataSentiment analysis of Twitter Data
Sentiment analysis of Twitter Data
 
Kafka Summit SF 2017 - Best Practices for Running Kafka on Docker Containers
Kafka Summit SF 2017 - Best Practices for Running Kafka on Docker ContainersKafka Summit SF 2017 - Best Practices for Running Kafka on Docker Containers
Kafka Summit SF 2017 - Best Practices for Running Kafka on Docker Containers
 
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh BhatnagarScalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
 
Selenium Automation Framework
Selenium Automation  FrameworkSelenium Automation  Framework
Selenium Automation Framework
 
[124] mit cheetah 로봇의 탄생
[124] mit cheetah 로봇의 탄생[124] mit cheetah 로봇의 탄생
[124] mit cheetah 로봇의 탄생
 
Swagger With REST APIs.pptx.pdf
Swagger With REST APIs.pptx.pdfSwagger With REST APIs.pptx.pdf
Swagger With REST APIs.pptx.pdf
 
Apache Superset - open source data exploration and visualization (Conclusion ...
Apache Superset - open source data exploration and visualization (Conclusion ...Apache Superset - open source data exploration and visualization (Conclusion ...
Apache Superset - open source data exploration and visualization (Conclusion ...
 
Selenium test automation
Selenium test automationSelenium test automation
Selenium test automation
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
stackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with Weaviatestackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with Weaviate
 
Understanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage EngineUnderstanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage Engine
 
Metaverse - The Future of Internet
Metaverse - The Future of InternetMetaverse - The Future of Internet
Metaverse - The Future of Internet
 
Introduction into Icinga
Introduction into IcingaIntroduction into Icinga
Introduction into Icinga
 
Scrapy.for.dummies
Scrapy.for.dummiesScrapy.for.dummies
Scrapy.for.dummies
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
 
A story of Netflix and AB Testing in the User Interface using DynamoDB - DAT3...
A story of Netflix and AB Testing in the User Interface using DynamoDB - DAT3...A story of Netflix and AB Testing in the User Interface using DynamoDB - DAT3...
A story of Netflix and AB Testing in the User Interface using DynamoDB - DAT3...
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
 

Similar a Machine Learning on Streaming Data using Kafka, Beam, and TensorFlow (Mikhail Chrestkha, Google Cloud; Stephane Maarek, DataCumulus) Kafka Summit NYC 2019

Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...HostedbyConfluent
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Databricks
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...James Anderson
 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Tushar Katarki
 
Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Sri Ambati
 
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...ScyllaDB
 
Yu_Wang_Resume
Yu_Wang_ResumeYu_Wang_Resume
Yu_Wang_ResumeRyan Wang
 
Deep learning for FinTech
Deep learning for FinTechDeep learning for FinTech
Deep learning for FinTechgeetachauhan
 
Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...
Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...
Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...Codemotion
 
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Kai Wähner
 
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...Henry Saputra
 
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Provectus
 
Production machine learning: Managing models, workflows and risk at scale
Production machine learning: Managing models, workflows and risk at scaleProduction machine learning: Managing models, workflows and risk at scale
Production machine learning: Managing models, workflows and risk at scaleAlex Housley
 
Hyf project ideas_02
Hyf project ideas_02Hyf project ideas_02
Hyf project ideas_02KatoK1
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?Ivo Andreev
 
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...Databricks
 

Similar a Machine Learning on Streaming Data using Kafka, Beam, and TensorFlow (Mikhail Chrestkha, Google Cloud; Stephane Maarek, DataCumulus) Kafka Summit NYC 2019 (20)

Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
DEVOPS AND MACHINE LEARNING
DEVOPS AND MACHINE LEARNINGDEVOPS AND MACHINE LEARNING
DEVOPS AND MACHINE LEARNING
 
Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session
 
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
 
Yu_Wang_Resume
Yu_Wang_ResumeYu_Wang_Resume
Yu_Wang_Resume
 
Deep learning for FinTech
Deep learning for FinTechDeep learning for FinTech
Deep learning for FinTech
 
Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...
Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...
Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...
 
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
 
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
 
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
 
Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
 
Production machine learning: Managing models, workflows and risk at scale
Production machine learning: Managing models, workflows and risk at scaleProduction machine learning: Managing models, workflows and risk at scale
Production machine learning: Managing models, workflows and risk at scale
 
Hyf project ideas_02
Hyf project ideas_02Hyf project ideas_02
Hyf project ideas_02
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
 
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...
 

Más de confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

Más de confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Último

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 

Último (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Machine Learning on Streaming Data using Kafka, Beam, and TensorFlow (Mikhail Chrestkha, Google Cloud; Stephane Maarek, DataCumulus) Kafka Summit NYC 2019

  • 1. Machine Learning on Streaming Data with Apache Kafka, Apache Beam, & TensorFlow
  • 2. About Us Mikhail Chrestkha Machine Learning Specialist Google Cloud linkedin.com/in/mchrestkha Stéphane Maarek CEO & Kafka Instructor DataCumulus linkedin.com/in/stephanemaarek Big Thanks to: Julianne Cuneo Big Data Specialist, Google Cloud Kai Waehner Technology Evangelist, Confluent
  • 3. Agenda 1. Motivation 2. Architecture 3. Use Case Walk-Through w/ Demo 4. Summary
  • 5. Technology Landscape Smart Analytics Streaming InfoWorld’s 2019 Technology of the Year Award Winners: ● Apache Beam ● Apache Kafka ● Elastic Stack ● DataStax Enterprise ● Firebase ● Horovod ● H2O Driverless AI ● Keras ● Kubernetes ● LLVM ● .Net Core ● PyTorch ● Redis ● TensorFlow ● Visual Studio Code ● XGBoost Cloud ? https://www.globenewswire.com/news-release/2019/01/30/1707685/0/en/InfoWorld-Announces-2019-Technology-of-the-Year-Award-Winners.html
  • 6. Data Ingestion Data Analysis & Transformation Trainer Model Evaluation & Validation Serving Notebook Orchestration ML Framework ML Platform
  • 7. OSS Managed Service Apache Kafka Event streaming platform Confluent Cloud Monitoring, Replication, Data Balancing Apache Beam Data processing pipelines Unified batch & streaming Dataflow Automated resource management of workers TensorFlow Robust foundation for machine and deep learning Cloud Machine Learning Engine ● Training: Distributed training infrastructure that supports CPUs, GPUs, and TPUs ● Serving: Host models for batch & online prediction
  • 9. Reference Kafka ML Architecture ● Data pipelines are simplified ● Building analytic modules is decoupled from servicing them ● Usage of real time or batch as needed ● Analytic models can be deployed in a performant, scalable and mission-critical environment Kai Waehner Technology Evangelist, Confluent https://www.confluent.io/blog/build-deploy-scalable-machine-learning-production-apache-kafka/
  • 10. Confluent Cloud Managed by Confluent Analytics, ML training & deployment path ML serving path Data warehouse BigQuery ML Training Cloud ML Engine Topic 1 Raw transaction Topic 2 Predictions Kafka Cluster Processing Cloud Dataflow Leverage managed services to simplify & focus on code not infrastructure Producer Consumer Consumer ML Notebook KSQL SQL Submit ML Training jobs ML Serving Cloud ML Engine ML notebook development / experimentation Deploy ML model Automate w/ AirFlow Dataflow Template
  • 11. 3 Use Case Walk-Through
  • 12. Kaggle Case Study Fraud Detection of Credit Card Transactions ● Collect transaction data ● Analyze historical data ● Train model on historic sample ● Evaluate model based on precision & recall ● Predict fraud on new streaming data 492 Fraud (0.172%) 284,807 transactions ● Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson and Gianluca Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015 ● Dal Pozzolo, Andrea; Caelen, Olivier; Le Borgne, Yann-Ael; Waterschoot, Serge; Bontempi, Gianluca. Learned lessons in credit card fraud detection from a practitioner perspective, Expert systems with applications,41,10,4915-4928,2014, Pergamon ● Dal Pozzolo, Andrea; Boracchi, Giacomo; Caelen, Olivier; Alippi, Cesare; Bontempi, Gianluca. Credit card fraud detection: a realistic modeling and a novel learning strategy, IEEE transactions on neural networks and learning systems,29,8,3784-3797,2018,IEEE ○ Dal Pozzolo, Andrea Adaptive Machine learning for credit card fraud detection ULB MLG PhD thesis (supervised by G. Bontempi) ● Carcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-Aël; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming credit card fraud detection with Spark, Information fusion,41, 182-194,2018,Elsevier ● Carcillo, Fabrizio; Le Borgne, Yann-Aël; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization, International Journal of Data Science and Analytics, 5,4,285-300,2018,Springer International Publishing https://opendatacommons.org/licenses/dbcl/1.0/
  • 13. DEMO 1 - 5 min Sending our credit card data Confluent Cloud, Creating a Topic, Python Script, Security
  • 14. Kafka to BigQuery Dataflow Template Java Code KafkaIO.<String, String>read() BigQueryIO.writeTableRows() Create a template for easy re-usability by an analyst Images from https://beam.apache.org/documentation/pipelines/design-your-pipeline/ redacted
  • 15. Explore data & train ML model from ksql import KSQLAPI redacted %%bigquery redacted gcloud ml-engine jobs submit training redacted Query directly from topic Query petabytes of data Submit ML training job
  • 16. DEMO 2 - 5 min Dataflow template & job Jupyter: KSQL, BQML, TensorFlow CMLE job
  • 17. Send Predictions back to Kafka Java Code Cloud Machine Learning Engine Request Response Hosted ML Model Image from https://beam.apache.org/documentation/pipelines/design-your-pipeline/ Publish models KafkaIO.<String, String>read() KafkaIO.<String, String>write() Train Model
  • 18. DEMO 3 - 5 min (1) Deploy model as an end point (2) Prediction sent to Kafka topic to be consumed (3) Track models & monitor predictions in CMLE UI
  • 19. Futuristic Architecture: Pure Kafka-based ML Resilient, highly available, sync & async Confluent Cloud Managed by Confluent Topic 1 Raw transaction Topic 2 Predictions Producer Consumer Consumer Kafka Streams ML Synchronous Application training serving Interactive Query API gRPC or REST API Internal Topic ML Model (compacted?) Model state
  • 21. Summary ● Kafka + Beam + TensorFlow = Great foundation for future ○ Batch today → streaming tomorrow ○ Small data → big data tomorrow ○ Shallow learning today → deep learning tomorrow ● Make data & ML easier for yourself by using managed services ● Build for many other use cases: ○ Predictive maintenance ○ Logistics routing ○ Image search & recommendations in e-commerce Smart AnalyticsStreaming Cloud
  • 22. Talk to Google Cloud K1Booth Learn More Blog: Enabling connected transformation with Apache Kafka and TensorFlow on Google Cloud Platform bit.ly/2CHERol KafkaIO on Beam bit.ly/2YwL3Jc KafkaToBigQuery Dataflow Template Example bit.ly/2HQqVN0 Contact us linkedin.com/in/mchrestkha linkedin.com/in/stephanemaarek
  • 23. Confluent Cloud Managed by Confluent Analytics, ML training & deployment path ML serving path Data warehouse BigQuery ML Training Cloud ML Engine Topic 1 Raw transaction Topic 2 Predictions Kafka Cluster Processing Cloud Dataflow Questions Producer Consumer Consumer Cloud ML Notebook KSQL SQL Submit ML Training jobs ML Serving Cloud ML Engine ML notebook development / experimentation Deploy ML model Automate w/ AirFlow Dataflow Template