Meetup - Brasil - Data In Motion - 2023 September 19

Timothy Spann
Timothy SpannDeveloper Advocate en StreamNative
Data in Motion:
Overview e Novidades do
NiFi, Kafka e Flink
Tim Spann - Principal Developer Advocate
Data In Motion
Meetup - Brasil - Data In Motion - 2023 September 19
3
© 2023 Cloudera, Inc. All rights reserved.
TODAY’S LEAD
Who am I?
@PaasDev
DZone Zone Leader and Big Data MVB
Princeton and NYC Future of Data Meetups
ex-Pivotal Field Engineer ex-StreamNative ex-PwC
https://github.com/tspannhw https://twitter.com/PaaSDev
https://www.datainmotion.dev/
https://medium.com/@tspann
Principal Data-in-Motion Developer Advocate
4
© 2023 Cloudera, Inc. All rights reserved.
Data in Motion: Overview e Novidades do NiFi, Kafka e Flink
Apresentador: Tim Spann - Principal DIM Specialist and Developer Advocate
Intro to NiFi
Intro to Kafka
Intro to Flink
Together as FLaNK
Demos
Q&A
© 2023 Cloudera, Inc. All rights reserved. 5
REAL-TIME REQUIRES A PLATFORM
SQL
Stream
Builder
© 2023 Cloudera, Inc. All rights reserved. 6
REST API ARCHITECTURE - Using FLaNK to pull the data out of anything in near-real time
INGEST PREPARE PUBLISH
DATA SOURCES
Internal Users
(After Sales)
External
Systems
ENTERPRISE
LAKEHOUSE
CAPABILITY VIEW
INGESTION
MESSAGE HUB
STORAGE
BATCH
MANAGEMENT
STREAM
CONSUMPTION
Closed Loop
Systems
SQL Stream Builder
Machine Learning
Data Visualization
Workload Manager
watsonx.data
Cloudera DataFlow - Apache NiFi
© 2019 Cloudera, Inc. All rights reserved. 8
CLOUDERA DATAFLOW - POWERED BY APACHE NiFi
Ingest and manage data from edge-to-cloud using a no-code interface
● #1 data ingestion/movement engine
● Strong community
● Product maturity over 11 years
● Deploy on-premises or in the cloud
● Over 400+ pre-built processors
● Built-in data provenance
● Guaranteed delivery
● Throttling and Back pressure
© 2023 Cloudera, Inc. All rights reserved. 9
PROVENANCE
10
© 2023 Cloudera, Inc. All rights reserved.
RECORD-ORIENTED DATA WITH NIFI
• Record Readers - Avro, CSV, Grok, IPFIX, JSAN1, JSON, Parquet,
Scripted, Syslog5424, Syslog, WindowsEvent, XML
• Record Writers - Avro, CSV, FreeFromText, Json, Parquet,
Scripted, XML
• Record Reader and Writer support referencing a schema registry
for retrieving schemas when necessary.
• Enable processors that accept any data format without having to
worry about the parsing and serialization logic.
• Allows us to keep FlowFiles larger, each consisting of multiple
records, which results in far better performance.
11
© 2023 Cloudera, Inc. All rights reserved.
RUNNING SQL ON FLOWFILES
• Evaluates one or more SQL queries against the contents of a
FlowFile.
• This can be used, for example, for field-specific filtering,
transformation, and row-level filtering.
• Columns can be renamed, simple calculations and aggregations
performed.
• The SQL statement must be valid ANSI SQL and is powered by
Apache Calcite.
12
© 2023 Cloudera, Inc. All rights reserved.
READYFLOW
GALLERY
• Cloudera provided flow
definitions
• Cover most common data flow
use cases
• Optimized to work with CDP
sources/destinations
• Can be deployed and adjusted
as needed
Cloudera Streams Messaging
Manager - Apache Kafka
14
© 2023 Cloudera, Inc. All rights reserved.
STREAMS MESSAGING WITH KAFKA
• Highly reliable distributed messaging system.
• Decouple applications, enables many-to-many
patterns.
• Publish-Subscribe semantics.
• Horizontal scalability.
• Efficient implementation to operate at speed with
big data volumes.
• Organized by topic to support several use cases.
Cloudera SQL Stream Builder - Flink
SQL
16
© 2023 Cloudera, Inc. All rights reserved.
DELIVERING STREAMING ANALYTICS
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. (second)
SQL
Parsing and
Blending Data
Streaming
Analytics
Both offline and
streaming data
Data Analysts Can
Write Queries
Across the Lines of Business
Capture Events
that Matter
Low-latency analytics use
cases
Events
Processing
17
© 2022 Cloudera, Inc. All rights reserved.
SQL STREAM BUILDER (SSB)
SQL STREAM BUILDER allows
developers, analysts, and data
scientists to write streaming
applications with industry
standard SQL.
No Java or Scala code
development required.
Simplifies access to data in Kafka
& Flink. Connectors to batch data in
HDFS, Kudu, Hive, S3, JDBC, CDC
and more
Enrich streaming data with batch
data in a single tool
Democratize access to real-time data with just SQL
18
© 2023 Cloudera, Inc. All rights reserved.
SSB MATERIALIZED VIEWS
Key Takeaway; MV’s allow data scientist, analyst and developers consume data from the firehose
Demo
20
© 2023 Cloudera, Inc. All rights reserved.
Data in Motion: Overview e Novidades do NiFi, Kafka e Flink
Apresentador: Tim Spann - Principal DIM Specialist and Developer Advocate
21
© 2023 Cloudera, Inc. All rights reserved.
FREE LEARNING ENVIRONMENT
23
© 2023 Cloudera, Inc. All rights reserved.
Cloudera Streams
Processing -
Community Edition
• Kafka, KConnect, SMM, SR,
Flink, and SSB in Docker
• Runs in Docker
• Try new features quickly
• Develop applications locally
● Docker compose file of CSP to run from command line w/o any
dependencies, including Flink, SQL Stream Builder, Kafka, Kafka
Connect, Streams Messaging Manager and Schema Registry
○ $> docker compose up
● Licensed under the Cloudera Community License
● Unsupported
● Community Group Hub for CSP
● Find it on docs.cloudera.com under Applications
Open Source Edition
• Apache NiFi in Docker
• Runs in Docker
• Try new features
quickly
• Develop applications
locally
● Docker NiFi
○ docker run --name nifi -p 8443:8443 -d -e
SINGLE_USER_CREDENTIALS_USERNAME=admin -e
SINGLE_USER_CREDENTIALS_PASSWORD=ctsBtRBKHRAx69EqUgh
vvgEvjnaLjFEB apache/nifi:latest
● Licensed under the ASF License
● Unsupported
https://hub.docker.com/r/apache/nifi
RESOURCES, WRAP-UP, Q&A
© 2023 Cloudera, Inc. All rights reserved. 26
Future of Data - NYC / Princeton + Virtual
@PaasDev
https://www.meetup.com/futureofdata-princeton/
https://www.meetup.com/futureofdata-newyork/
From Big Data to AI to Streaming to LLM to Cloud to
Analytics to NLP to Fast Data to Machine Learning to
Microservices to ...
https://medium.com/cloudera-inc/streaming-llm-with-apache-nifi-huggin
gface-ad2f0d367468
28
© 2023 Cloudera, Inc. All rights reserved.
Streaming Resources
• https://dzone.com/articles/real-time-stream-processing-with-hazelcast-an
d-streamnative
• https://flipstackweekly.com/
• https://www.datainmotion.dev/
• https://www.flankstack.dev/
• https://github.com/tspannhw
• https://medium.com/@tspann
• https://medium.com/@tspann/predictions-for-streaming-in-2023-ad4d739
5d714
• https://www.apachecon.com/acna2022/slides/04_Spann_Tim_Citizen_Str
eaming_Engineer.pdf
© 2023 Cloudera, Inc. All rights reserved. 29
FLaNK Stack Weekly
This week in Apache NiFi, Apache Flink, Apache
Kafka, Apache Spark, Apache Iceberg, Python,
Java and Open Source friends.
https://bit.ly/32dAJft
Generative AI
https://github.com/tspannhw/FLaNK-HuggingFace-DistilBert-SentimentAnalysis
https://github.com/tspannhw/FLaNK-LLM
watsonx.ai
LLM USE CASE
Vector DB
AI Model
Unstructured file types
Data in Motion
on Cloudera Data
Platform (CDP)
Capture, process &
distribute any data,
anywhere
Other enterprise data Open Data Lakehouse
Materialized Views
Structured Sources
Applications/API’s
Streams
Meetup - Brasil - Data In Motion - 2023 September 19
33
© 2023 Cloudera, Inc. All rights reserved.
TH N Y U
1 de 33

Recomendados

OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf por
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfTimothy Spann
23 vistas43 diapositivas
GSJUG: Mastering Data Streaming Pipelines 09May2023 por
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023Timothy Spann
255 vistas80 diapositivas
Building Real-Time Travel Alerts por
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel AlertsTimothy Spann
165 vistas48 diapositivas
JConWorld_ Continuous SQL with Kafka and Flink por
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
156 vistas36 diapositivas
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data por
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataTimothy Spann
193 vistas45 diapositivas
ITPC Building Modern Data Streaming Apps por
ITPC Building Modern Data Streaming AppsITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming AppsTimothy Spann
797 vistas64 diapositivas

Más contenido relacionado

Similar a Meetup - Brasil - Data In Motion - 2023 September 19

The Never Landing Stream with HTAP and Streaming por
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingTimothy Spann
254 vistas39 diapositivas
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp por
Using Apache NiFi with Apache Pulsar for Fast Data On-RampUsing Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-RampTimothy Spann
163 vistas27 diapositivas
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023 por
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023ssuser73434e
54 vistas79 diapositivas
Best Practices For Workflow por
Best Practices For WorkflowBest Practices For Workflow
Best Practices For WorkflowTimothy Spann
89 vistas86 diapositivas
Kafka for DBAs por
Kafka for DBAsKafka for DBAs
Kafka for DBAsGwen (Chen) Shapira
12.8K vistas41 diapositivas
Introduction to Apache NiFi 1.10 por
Introduction to Apache NiFi 1.10Introduction to Apache NiFi 1.10
Introduction to Apache NiFi 1.10Timothy Spann
2K vistas24 diapositivas

Similar a Meetup - Brasil - Data In Motion - 2023 September 19(20)

The Never Landing Stream with HTAP and Streaming por Timothy Spann
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
Timothy Spann254 vistas
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp por Timothy Spann
Using Apache NiFi with Apache Pulsar for Fast Data On-RampUsing Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Timothy Spann163 vistas
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023 por ssuser73434e
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
ssuser73434e54 vistas
Best Practices For Workflow por Timothy Spann
Best Practices For WorkflowBest Practices For Workflow
Best Practices For Workflow
Timothy Spann89 vistas
Introduction to Apache NiFi 1.10 por Timothy Spann
Introduction to Apache NiFi 1.10Introduction to Apache NiFi 1.10
Introduction to Apache NiFi 1.10
Timothy Spann2K vistas
Meet the Committers Webinar_ Lab Preparation por Timothy Spann
Meet the Committers Webinar_ Lab PreparationMeet the Committers Webinar_ Lab Preparation
Meet the Committers Webinar_ Lab Preparation
Timothy Spann32 vistas
CoC23_Utilizing Real-Time Transit Data for Travel Optimization por Timothy Spann
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationCoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
Timothy Spann31 vistas
Reinventing Kafka in the Data Streaming Era - Jun Rao por confluent
Reinventing Kafka in the Data Streaming Era - Jun RaoReinventing Kafka in the Data Streaming Era - Jun Rao
Reinventing Kafka in the Data Streaming Era - Jun Rao
confluent143 vistas
Part 2: A Visual Dive into Machine Learning and Deep Learning 
 por Cloudera, Inc.
Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

Cloudera, Inc.1.5K vistas
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends por Timothy Spann
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
Timothy Spann986 vistas
Best Practices for Building Hybrid-Cloud Architectures | Hans Jespersen por confluent
Best Practices for Building Hybrid-Cloud Architectures | Hans JespersenBest Practices for Building Hybrid-Cloud Architectures | Hans Jespersen
Best Practices for Building Hybrid-Cloud Architectures | Hans Jespersen
confluent403 vistas
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka... por Timothy Spann
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
Timothy Spann519 vistas
Spark One Platform Webinar por Cloudera, Inc.
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
Cloudera, Inc.2.5K vistas
Music city data Hail Hydrate! from stream to lake por Timothy Spann
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
Timothy Spann708 vistas
Webinar | Better Together: Apache Cassandra and Apache Kafka por DataStax
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
DataStax881 vistas
End to End Streaming Architectures por Cloudera, Inc.
End to End Streaming ArchitecturesEnd to End Streaming Architectures
End to End Streaming Architectures
Cloudera, Inc.3.5K vistas
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre... por HostedbyConfluent
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
HostedbyConfluent330 vistas
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic... por HostedbyConfluent
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
HostedbyConfluent395 vistas

Más de Timothy Spann

[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines por
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data PipelinesTimothy Spann
150 vistas25 diapositivas
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo por
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoTimothy Spann
161 vistas8 diapositivas
CoC23_ Looking at the New Features of Apache NiFi por
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiTimothy Spann
36 vistas24 diapositivas
CoC23_ Let’s Monitor The Conditions at the Conference por
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the ConferenceTimothy Spann
17 vistas17 diapositivas
Implement a Universal Data Distribution Architecture to Manage All Streaming ... por
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Timothy Spann
28 vistas56 diapositivas
big data fest building modern data streaming apps por
big data fest building modern data streaming appsbig data fest building modern data streaming apps
big data fest building modern data streaming appsTimothy Spann
317 vistas55 diapositivas

Más de Timothy Spann(17)

[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines por Timothy Spann
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
Timothy Spann150 vistas
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo por Timothy Spann
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Timothy Spann161 vistas
CoC23_ Looking at the New Features of Apache NiFi por Timothy Spann
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFi
Timothy Spann36 vistas
CoC23_ Let’s Monitor The Conditions at the Conference por Timothy Spann
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the Conference
Timothy Spann17 vistas
Implement a Universal Data Distribution Architecture to Manage All Streaming ... por Timothy Spann
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Timothy Spann28 vistas
big data fest building modern data streaming apps por Timothy Spann
big data fest building modern data streaming appsbig data fest building modern data streaming apps
big data fest building modern data streaming apps
Timothy Spann317 vistas
BestInFlowCompetitionTutorials03May2023 por Timothy Spann
BestInFlowCompetitionTutorials03May2023BestInFlowCompetitionTutorials03May2023
BestInFlowCompetitionTutorials03May2023
Timothy Spann11 vistas
CloudToolGuidance03May2023 por Timothy Spann
CloudToolGuidance03May2023CloudToolGuidance03May2023
CloudToolGuidance03May2023
Timothy Spann10 vistas
Cloudera Sandbox Event Guidelines For Workflow por Timothy Spann
Cloudera Sandbox Event Guidelines For WorkflowCloudera Sandbox Event Guidelines For Workflow
Cloudera Sandbox Event Guidelines For Workflow
Timothy Spann32 vistas
DevNexus: Apache Pulsar Development 101 with Java por Timothy Spann
DevNexus:  Apache Pulsar Development 101 with JavaDevNexus:  Apache Pulsar Development 101 with Java
DevNexus: Apache Pulsar Development 101 with Java
Timothy Spann261 vistas
Conf42 Python_ ML Enhanced Event Streaming Apps with Python Microservices por Timothy Spann
Conf42 Python_ ML Enhanced Event Streaming Apps with Python MicroservicesConf42 Python_ ML Enhanced Event Streaming Apps with Python Microservices
Conf42 Python_ ML Enhanced Event Streaming Apps with Python Microservices
Timothy Spann443 vistas
PythonWebConference_ Cloud Native Apache Pulsar Development 202 with Python por Timothy Spann
PythonWebConference_ Cloud Native Apache Pulsar Development 202 with PythonPythonWebConference_ Cloud Native Apache Pulsar Development 202 with Python
PythonWebConference_ Cloud Native Apache Pulsar Development 202 with Python
Timothy Spann430 vistas
PhillyJug Getting Started With Real-time Cloud Native Streaming With Java por Timothy Spann
PhillyJug  Getting Started With Real-time Cloud Native Streaming With JavaPhillyJug  Getting Started With Real-time Cloud Native Streaming With Java
PhillyJug Getting Started With Real-time Cloud Native Streaming With Java
Timothy Spann625 vistas
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud) por Timothy Spann
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Timothy Spann18 vistas
Living the Stream Dream with Pulsar and Spring Boot por Timothy Spann
Living the Stream Dream with Pulsar and Spring BootLiving the Stream Dream with Pulsar and Spring Boot
Living the Stream Dream with Pulsar and Spring Boot
Timothy Spann94 vistas
Let's keep it simple and streaming por Timothy Spann
Let's keep it simple and streamingLet's keep it simple and streaming
Let's keep it simple and streaming
Timothy Spann19 vistas
Sink Your Teeth into Streaming at Any Scale por Timothy Spann
Sink Your Teeth into Streaming at Any ScaleSink Your Teeth into Streaming at Any Scale
Sink Your Teeth into Streaming at Any Scale
Timothy Spann12 vistas

Último

Ukraine Infographic_22NOV2023_v2.pdf por
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdfAnastosiyaGurin
1.4K vistas3 diapositivas
Custom Tag Manager Templates por
Custom Tag Manager TemplatesCustom Tag Manager Templates
Custom Tag Manager TemplatesMarkus Baersch
30 vistas17 diapositivas
4_4_WP_4_06_ND_Model.pptx por
4_4_WP_4_06_ND_Model.pptx4_4_WP_4_06_ND_Model.pptx
4_4_WP_4_06_ND_Model.pptxd6fmc6kwd4
7 vistas13 diapositivas
Customer Data Cleansing Project.pptx por
Customer Data Cleansing Project.pptxCustomer Data Cleansing Project.pptx
Customer Data Cleansing Project.pptxNat O
6 vistas23 diapositivas
Dr. Ousmane Badiane-2023 ReSAKSS Conference por
Dr. Ousmane Badiane-2023 ReSAKSS ConferenceDr. Ousmane Badiane-2023 ReSAKSS Conference
Dr. Ousmane Badiane-2023 ReSAKSS ConferenceAKADEMIYA2063
5 vistas34 diapositivas
Inawisdom Quick Sight por
Inawisdom Quick SightInawisdom Quick Sight
Inawisdom Quick SightPhilipBasford
7 vistas27 diapositivas

Último(20)

Ukraine Infographic_22NOV2023_v2.pdf por AnastosiyaGurin
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdf
AnastosiyaGurin1.4K vistas
4_4_WP_4_06_ND_Model.pptx por d6fmc6kwd4
4_4_WP_4_06_ND_Model.pptx4_4_WP_4_06_ND_Model.pptx
4_4_WP_4_06_ND_Model.pptx
d6fmc6kwd47 vistas
Customer Data Cleansing Project.pptx por Nat O
Customer Data Cleansing Project.pptxCustomer Data Cleansing Project.pptx
Customer Data Cleansing Project.pptx
Nat O6 vistas
Dr. Ousmane Badiane-2023 ReSAKSS Conference por AKADEMIYA2063
Dr. Ousmane Badiane-2023 ReSAKSS ConferenceDr. Ousmane Badiane-2023 ReSAKSS Conference
Dr. Ousmane Badiane-2023 ReSAKSS Conference
AKADEMIYA20635 vistas
K-Drama Recommendation Using Python por FridaPutriassa
K-Drama Recommendation Using PythonK-Drama Recommendation Using Python
K-Drama Recommendation Using Python
FridaPutriassa5 vistas
Listed Instruments Survey 2022.pptx por secretariat4
Listed Instruments Survey  2022.pptxListed Instruments Survey  2022.pptx
Listed Instruments Survey 2022.pptx
secretariat4121 vistas
DGST Methodology Presentation.pdf por maddierlegum
DGST Methodology Presentation.pdfDGST Methodology Presentation.pdf
DGST Methodology Presentation.pdf
maddierlegum7 vistas
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... por DataScienceConferenc1
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an... por StatsCommunications
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
CRM stick or twist.pptx por info828217
CRM stick or twist.pptxCRM stick or twist.pptx
CRM stick or twist.pptx
info82821711 vistas
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ... por DataScienceConferenc1
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
LIVE OAK MEMORIAL PARK.pptx por ms2332always
LIVE OAK MEMORIAL PARK.pptxLIVE OAK MEMORIAL PARK.pptx
LIVE OAK MEMORIAL PARK.pptx
ms2332always7 vistas
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf por 10urkyr34
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf
10urkyr347 vistas

Meetup - Brasil - Data In Motion - 2023 September 19

  • 1. Data in Motion: Overview e Novidades do NiFi, Kafka e Flink Tim Spann - Principal Developer Advocate Data In Motion
  • 3. 3 © 2023 Cloudera, Inc. All rights reserved. TODAY’S LEAD Who am I? @PaasDev DZone Zone Leader and Big Data MVB Princeton and NYC Future of Data Meetups ex-Pivotal Field Engineer ex-StreamNative ex-PwC https://github.com/tspannhw https://twitter.com/PaaSDev https://www.datainmotion.dev/ https://medium.com/@tspann Principal Data-in-Motion Developer Advocate
  • 4. 4 © 2023 Cloudera, Inc. All rights reserved. Data in Motion: Overview e Novidades do NiFi, Kafka e Flink Apresentador: Tim Spann - Principal DIM Specialist and Developer Advocate Intro to NiFi Intro to Kafka Intro to Flink Together as FLaNK Demos Q&A
  • 5. © 2023 Cloudera, Inc. All rights reserved. 5 REAL-TIME REQUIRES A PLATFORM SQL Stream Builder
  • 6. © 2023 Cloudera, Inc. All rights reserved. 6 REST API ARCHITECTURE - Using FLaNK to pull the data out of anything in near-real time INGEST PREPARE PUBLISH DATA SOURCES Internal Users (After Sales) External Systems ENTERPRISE LAKEHOUSE CAPABILITY VIEW INGESTION MESSAGE HUB STORAGE BATCH MANAGEMENT STREAM CONSUMPTION Closed Loop Systems SQL Stream Builder Machine Learning Data Visualization Workload Manager watsonx.data
  • 7. Cloudera DataFlow - Apache NiFi
  • 8. © 2019 Cloudera, Inc. All rights reserved. 8 CLOUDERA DATAFLOW - POWERED BY APACHE NiFi Ingest and manage data from edge-to-cloud using a no-code interface ● #1 data ingestion/movement engine ● Strong community ● Product maturity over 11 years ● Deploy on-premises or in the cloud ● Over 400+ pre-built processors ● Built-in data provenance ● Guaranteed delivery ● Throttling and Back pressure
  • 9. © 2023 Cloudera, Inc. All rights reserved. 9 PROVENANCE
  • 10. 10 © 2023 Cloudera, Inc. All rights reserved. RECORD-ORIENTED DATA WITH NIFI • Record Readers - Avro, CSV, Grok, IPFIX, JSAN1, JSON, Parquet, Scripted, Syslog5424, Syslog, WindowsEvent, XML • Record Writers - Avro, CSV, FreeFromText, Json, Parquet, Scripted, XML • Record Reader and Writer support referencing a schema registry for retrieving schemas when necessary. • Enable processors that accept any data format without having to worry about the parsing and serialization logic. • Allows us to keep FlowFiles larger, each consisting of multiple records, which results in far better performance.
  • 11. 11 © 2023 Cloudera, Inc. All rights reserved. RUNNING SQL ON FLOWFILES • Evaluates one or more SQL queries against the contents of a FlowFile. • This can be used, for example, for field-specific filtering, transformation, and row-level filtering. • Columns can be renamed, simple calculations and aggregations performed. • The SQL statement must be valid ANSI SQL and is powered by Apache Calcite.
  • 12. 12 © 2023 Cloudera, Inc. All rights reserved. READYFLOW GALLERY • Cloudera provided flow definitions • Cover most common data flow use cases • Optimized to work with CDP sources/destinations • Can be deployed and adjusted as needed
  • 14. 14 © 2023 Cloudera, Inc. All rights reserved. STREAMS MESSAGING WITH KAFKA • Highly reliable distributed messaging system. • Decouple applications, enables many-to-many patterns. • Publish-Subscribe semantics. • Horizontal scalability. • Efficient implementation to operate at speed with big data volumes. • Organized by topic to support several use cases.
  • 15. Cloudera SQL Stream Builder - Flink SQL
  • 16. 16 © 2023 Cloudera, Inc. All rights reserved. DELIVERING STREAMING ANALYTICS 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. (second) SQL Parsing and Blending Data Streaming Analytics Both offline and streaming data Data Analysts Can Write Queries Across the Lines of Business Capture Events that Matter Low-latency analytics use cases Events Processing
  • 17. 17 © 2022 Cloudera, Inc. All rights reserved. SQL STREAM BUILDER (SSB) SQL STREAM BUILDER allows developers, analysts, and data scientists to write streaming applications with industry standard SQL. No Java or Scala code development required. Simplifies access to data in Kafka & Flink. Connectors to batch data in HDFS, Kudu, Hive, S3, JDBC, CDC and more Enrich streaming data with batch data in a single tool Democratize access to real-time data with just SQL
  • 18. 18 © 2023 Cloudera, Inc. All rights reserved. SSB MATERIALIZED VIEWS Key Takeaway; MV’s allow data scientist, analyst and developers consume data from the firehose
  • 19. Demo
  • 20. 20 © 2023 Cloudera, Inc. All rights reserved. Data in Motion: Overview e Novidades do NiFi, Kafka e Flink Apresentador: Tim Spann - Principal DIM Specialist and Developer Advocate
  • 21. 21 © 2023 Cloudera, Inc. All rights reserved.
  • 23. 23 © 2023 Cloudera, Inc. All rights reserved. Cloudera Streams Processing - Community Edition • Kafka, KConnect, SMM, SR, Flink, and SSB in Docker • Runs in Docker • Try new features quickly • Develop applications locally ● Docker compose file of CSP to run from command line w/o any dependencies, including Flink, SQL Stream Builder, Kafka, Kafka Connect, Streams Messaging Manager and Schema Registry ○ $> docker compose up ● Licensed under the Cloudera Community License ● Unsupported ● Community Group Hub for CSP ● Find it on docs.cloudera.com under Applications
  • 24. Open Source Edition • Apache NiFi in Docker • Runs in Docker • Try new features quickly • Develop applications locally ● Docker NiFi ○ docker run --name nifi -p 8443:8443 -d -e SINGLE_USER_CREDENTIALS_USERNAME=admin -e SINGLE_USER_CREDENTIALS_PASSWORD=ctsBtRBKHRAx69EqUgh vvgEvjnaLjFEB apache/nifi:latest ● Licensed under the ASF License ● Unsupported https://hub.docker.com/r/apache/nifi
  • 26. © 2023 Cloudera, Inc. All rights reserved. 26 Future of Data - NYC / Princeton + Virtual @PaasDev https://www.meetup.com/futureofdata-princeton/ https://www.meetup.com/futureofdata-newyork/ From Big Data to AI to Streaming to LLM to Cloud to Analytics to NLP to Fast Data to Machine Learning to Microservices to ...
  • 28. 28 © 2023 Cloudera, Inc. All rights reserved. Streaming Resources • https://dzone.com/articles/real-time-stream-processing-with-hazelcast-an d-streamnative • https://flipstackweekly.com/ • https://www.datainmotion.dev/ • https://www.flankstack.dev/ • https://github.com/tspannhw • https://medium.com/@tspann • https://medium.com/@tspann/predictions-for-streaming-in-2023-ad4d739 5d714 • https://www.apachecon.com/acna2022/slides/04_Spann_Tim_Citizen_Str eaming_Engineer.pdf
  • 29. © 2023 Cloudera, Inc. All rights reserved. 29 FLaNK Stack Weekly This week in Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Python, Java and Open Source friends. https://bit.ly/32dAJft
  • 31. LLM USE CASE Vector DB AI Model Unstructured file types Data in Motion on Cloudera Data Platform (CDP) Capture, process & distribute any data, anywhere Other enterprise data Open Data Lakehouse Materialized Views Structured Sources Applications/API’s Streams
  • 33. 33 © 2023 Cloudera, Inc. All rights reserved. TH N Y U