SlideShare una empresa de Scribd logo
1 de 59
Descargar para leer sin conexión
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF
HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH
Streaming Data Ingestion in BigData-
und IoT-Anwendungen
Guido Schmutz – 27.9.2018
@gschmutz guidoschmutz.wordpress.com
Guido Schmutz
Working at Trivadis for more than 21 years
Oracle ACE Director for Fusion Middleware and SOA
Consultant, Trainer Software Architect for Java, Oracle, SOA and
Big Data / Fast Data
Head of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 30 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz
COPENHAGEN
MUNICH
LAUSANNE
BERN
ZURICH
BRUGG
GENEVA
HAMBURG
DÜSSELDORF
FRANKFURT
STUTTGART
FREIBURG
BASEL
VIENNA
With over 600 specialists and IT experts in your region.
14 Trivadis branches and more than
600 employees
200 Service Level Agreements
Over 4,000 training participants
Research and development budget:
CHF 5.0 million
Financially self-supporting and
sustainably profitable
Experience from more than 1,900
projects per year at over 800
customers
Agenda
1. Big Data and IoT Reference Architecture
2. Event Hub
3. Stream Data Integration
Apache NiFi
StreamSets Data Collector
Kafka Connect
4. Summary
Big Data and IoT Reference
Architecture
Bulk Source
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
BI Tools
Enterprise Data
Warehouse
SQL
Search / Explore
Parallel
Processing
Storage
Storage
RawRefined
Results
high latency
Enterprise Apps
Logic
{ }
API
File Import / SQL Import
DB
Extract
File
DB
Big Data solves Volume and Variety – not Velocity
Introduction to Stream Processing
Bulk Source
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
BI Tools
Enterprise Data
Warehouse
SQL
Search / Explore
Parallel
Processing
Storage
Storage
RawRefined
Results
high latency
Enterprise Apps
Logic
{ }
API
File Import / SQL Import
DB
Extract
File
DB
Event Source
Location
Telemetry
IoT
Data
Mobile
Apps
Social
Big Data solves Volume and Variety – not Velocity
Introduction to Stream Processing
Event Stream
Bulk Source
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
BI Tools
Enterprise Data
Warehouse
SQL
Search / Explore
• Machine Learning
• Graph Algorithms
• Natural Language Processing
Parallel
Processing
Storage
Storage
RawRefined
Results
high latency
Enterprise Apps
Logic
{ }
API
File Import / SQL Import
DB
Extract
File
DB
Event Stream
Event Source
Location
IoT
Data
Mobile
Apps
Social
Big Data solves Volume and Variety – not Velocity
Introduction to Stream Processing
Event
Hub
Event
Hub
Event
Hub
Telemetry
Event
Hub
Event
Hub
Hadoop Clusterd
Hadoop Cluster
Stream Analytics
Platform
Stream Processing Architecture solves Velocity
BI Tools
Enterprise Data
Warehouse
Event
Hub
Search / Explore
Enterprise Apps
Search
Results
Stream Analytics
Reference /
Models
Dashboard
Logic
{ }
API
Event
Stream
Event
Stream
Event
Stream
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Introduction to Stream Processing
Low(est) latency, no history
Telemetry
Hadoop Clusterd
Hadoop Cluster
Stream Analytics
Platform
Big Data for all historical data analysis
BI Tools
Enterprise Data
Warehouse
Search / Explore
Enterprise Apps
Search
Results
Stream Analytics
Reference /
Models
Dashboard
Logic
{ }
API
Event
Stream
Event
Stream
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
Parallel
Processing
Storage
Storage
RawRefined
Results
Data FlowEvent
Hub
Event
Stream
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
File Import / SQL Import
Introduction to Stream Processing
Telemetry
Data Store
Integrate existing systems through CDC
Data
Event Hub
Integration
Consuming Systems
StateLogic
CDC
CDC Connector
Traditional Silo-based
System
LogicUser Interface
Capture changes directly on database
Change Data Capture (CDC) => think like
a global database trigger
Transform existing systems to event
producer
Event
Stream
Event
Stream
Introduction to Stream Processing
Hadoop Clusterd
Hadoop Cluster
Stream Analytics
Platform
Integrate existing systems with lower latency through CDC
BI Tools
Enterprise Data
Warehouse
Search / Explore
Enterprise Apps
Search
Results
Stream Analytics
Reference /
Models
Dashboard
Logic
{ }
API
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
Parallel
Processing
Storage
Storage
RawRefined
Results
File Import / SQL Import
Event
Stream
Event
Stream
Data FlowEvent
Hub
Event
Stream
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Introduction to Stream Processing
Telemetry
New systems participate in event-oriented fashion
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
Parallel
Processing
Storage
Storage
RawRefined
Results
Microservice Platform
Microservice State
{ }
API
Stream Analytics Platform
Stream
Processor
State
{ }
API
Event
Stream
SQL
Search
BI Tools
Enterprise Data
Warehouse
Search / Explore
Service
Enterprise Apps
Logic
{ }
API
File Import / SQL Import
Event
Stream
Data FlowEvent
Hub
Event
Stream
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Event
Stream
Event
Stream
Introduction to Stream Processing
Telemetry
Edge computing allows processing close to data sources
Hadoop Clusterd
Hadoop Cluster
Big Data Platform
Parallel
Processing
Storage
Storage
RawRefined
Results
Microservice Platfrom
Microservice State
{ }
API
Stream Analytics Platform
Stream
Processor
State
{ }
API
SQL
Search
BI Tools
Enterprise Data
Warehouse
Search / Explore
Service
Enterprise Apps
Logic
{ }
API
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Edge Node
File Import / SQL Import
Event
Hub
Event
Stream
Event
Stream
Event Stream
Introduction to Stream Processing
Telemetry
Rules
Event Hub
Storage
Hadoop Clusterd
Hadoop Cluster
Big Data
Unified Architecture for Modern Data Analytics Solutions
SQL
Search
BI Tools
Enterprise Data
Warehouse
Search / Explore
File Import / SQL Import
Event
Hub
Parallel
Processing
Storage
Storage
RawRefined
Results
Microservice State
{ }
API
Stream
Processor
State
{ }
API
Event
Stream
Event
Stream
Service
Stream Analytics
Microservices
Enterprise Apps
Logic
{ }
API
Edge Node
Rules
Event Hub
Storage
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Event Stream
Telemetry
Two Types of Stream Processing
(from Gartner)
Introduction to Stream Processing
Stream Data Integration
• primarily focuses on the ingestion and
processing of data sources targeting real-
time extract-transform-load (ETL) and data
integration use cases
• filter and enrich the data
• optionally calculate time-windowed
aggregations before storing the results in a
database or file system
Stream Analytics
• targets analytics use cases
• calculating aggregates and detecting
patterns to generate higher-level, more
relevant summary information (complex
events)
• Complex events may signify threats or
opportunities that require a response from
the business through real-time dashboards,
alerts or decision automation
Event Hub
Implementing "Event Hub"
Hadoop Clusterd
Hadoop Cluster
Cluster Infrastructure
Parallel
Processing
Storage
Storage
RawRefined
Results
Microservice State
{ }
API
Stream
Processor
State
{ }
API
SQL
Search
BI Tools
Enterprise Data
Warehouse
Search / Explore
Service
Enterprise Apps
Logic
{ }
API
Bulk Source
Event Source
Location
DB
Extract
File
Weather
DB
IoT
Data
Mobile
Apps
Social
Edge Node
Rules
Event Hub
Storage
File Import / SQL Import
Event
Hub
Event
Stream
Event
Stream
Event Stream
Replay
Big Data Batch Analytics
Stream Analytics
Modern Applications
Introduction to Stream Processing
Apache Kafka – A Streaming Platform
Introduction to Stream Processing
High-Level Architecture
Distributed Log at the Core
Scale-Out Architecture
Logs do not (necessarily) forget
Hold Data for Long-Term – Data Retention
Producer 1
Broker 1
Broker 2
Broker 3
1. Never
2. Time based (TTL)
log.retention.{ms | minutes | hours}
3. Size based
log.retention.bytes
4. Log compaction based
(entries with same key are removed):
kafka-topics.sh --zookeeper zk:2181 
--create --topic customers 
--replication-factor 1 
--partitions 1 
--config cleanup.policy=compact
Introduction to Stream Processing
Keep Topics in Compacted Form
0 1 2 3 4 5 6 7 8 9 10 11
K1 K2 K1 K1 K3 K2 K4 K5 K5 K2 K6 K2
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
Offset
Key
Value
3 4 6 8 9 10
K1 K3 K4 K5 K2 K6
V4 V5 V7 V9 V10 V11
Offset
Key
Value
Compaction
Introduction to Stream Processing
V1
V2
V3 V4
V5
V6
V7
V8 V9
V1
0
V1
1
K1
K3
K4
K5
K2
K6
Stream Data Integration
Implementing "Stream Data Integration"
Hadoop Clusterd
Hadoop Cluster
Cluster Infrastructure
Parallel
Processing
Storage
Storage
RawRefined
Results
Microservice State
{ }
API
Stream
Processor
State
{ }
API
SQL
Search
BI Tools
Enterprise Data
Warehouse
Search / Explore
Service
Enterprise Apps
Logic
{ }
API
Bulk Source
Event Source
Location
DB
Extract
File
Weather
DB
IoT
Data
Mobile
Apps
Social
Edge Node
Rules
Event Hub
Storage
File Import / SQL Import
Event
Hub
Event
Stream
Event
Stream
Event Stream
Replay
Big Data Batch Analytics
Stream Analytics
Modern Applications
Introduction to Stream Processing
Integrating (Streaming) Data Sources
Introduction to Stream Processing
SQL Polling
Change Data Capture
(CDC)
File Polling
File Stream (File Tailing)
File Stream (Appender)
Sensor Stream
IoT GW
MQTT
Broker
IoT devices will often not be able to
talk to Kafka directly
DB Source
Big Data
Lo
g
Stream
Processing
IoT Sensor
Event Hub
Topic
Topic
REST
Topic
IoT GW
CDC GW
Connect
CDC
DB Source
Lo
g
CDC
Native
IoT Sensor
IoT Sensor
26
Dataflow
GW
Topic
Topic
Queue
Messaging
GWTopic
Dataflow
GWDataflow
Topic
REST
26
File Source
Lo
g
Lo
g
Lo
g
Social
Native
Why is Data Ingestion Difficult?
Physical and Logical
Infrastructure changes
rapidly
Key Challenges:
Infrastructure Automation
Edge Deployment
Infrastructure Drift
Data Structures and
formats evolve and change
unexpectedly
Key Challenges:
Consumption Readiness
Corruption and Loss
Structure Drift
Data semantics change
with evolving applications
Key Challenges
Timely Intervention
System Consistency
Semantic Drift
Source: Streamsets
Integration with or without
Transformation?
Zero Transformation
• No transformation, plain ingest, no
schema validation
• Keep the original format – Text,
CSV, …
• Allows to store data that may have
errors in the schema
Format Transformation
• Prefer name of Format Translation
• Simply change the format
• Change format from Text to Avro
• Does schema validation
Enrichment Transformation
• Add new data to the message
• Do not change existing values
• Convert a value from one system to
another and add it to the message
Value Transformation
• Replaces values in the message
• Convert a value from one system to
another and change the value in-place
• Destroys the raw data!
Introduction to Stream Processing
d
Demo Case
Truck-2
truck/nn/
position
Truck-1
Truck-3
truck
position raw?
truck/nn/
positionTruck-4
Truck-5
Raw Data
Store
?
{"truckid":"57","driverid":"15","routeid":"192762466
2","eventtype":"Normal","latitude":"38.65","longitu
de":"-
90.21","correlationId":"4412891759760421296"}
Stream Data Integration: Apache
NiFi
Apache NiFi
• Originated at NSA as Niagarafiles – developed
behind closed doors for 8 years
• Open sourced December 2014, Apache Top
Level Project July 2015
• Look-and-Feel modernized in 2016
• Opaque, “file-oriented” payload
• Distributed system of processors with
centralized control
• Based on flow-based programming concepts
• Data Provenance and Data Lineage
• Web-based user interface
Processors for Source and Sink
• ConsumeXXXX (AMQP, EWS, IMAP, JMS, Kafka, MQTT, POP3, …)
• DeleteXXXX (DynamoDB, Elasticsearch, HDFS, RethinkDB, S3, SQS, ...)
• FetchXXXX (AzureBlobStorage, ElasticSearch, File, FTP, HBase, HDFS, S3 ...)
• ExecuteXXXX (FlumeSink, FlumeSource, Script, SQL, ...)
• GetXXXX (AzureEventHub, Couchbase, DynamoDB, File, FTP, HBase, HDFS,
HTTP, Ignite, JMSQueue, JMSTopic, Kafka, Mongo, Solr, Splunk, SQS, TCP, ...)
• ListenXXXX (HTTP, RELP, SMTP, Syslog, TCP, UDP, WebSocket, ...)
• PublishXXXX (Kafka, MQTT)
• PutXXXX (AzureBlobStorage, AzureEventHub, CassandraQL, CloudWatchMetric,
Couchbase, DynamoDB, Elasticsearch, Email, FTP, File, Hbase, HDFS, HiveQL,
Kudu, Lambda, Mongo, Parquet, Slack, SQL, TCP, ....)
• QueryXXXX (Cassandra, DatabaseTable, DNS, Elasticserach)
Processors for Processing
• ConvertXxxxToYyyy
• ConvertRecord
• EnforceOrder
• EncryptContent
• ExtractXXXX (AvroMetdata,
EmailAttachments, Grok,
HL7Attributes, ImageMetadata, ...)
• GeoEnrichIP
• JoltTransformJSON
• MergeContent
• ReplaceText
• ResizeImage
• SplitXXXX (Avro, Content, JSON,
Record, Xml, ...)
• TailFile
• TransformXML
• UpdateAttribute
Demo Case
Truck-2
truck/nn/
position
Truck-1
Truck-3
truck
position raw
truck/nn/
positionTruck-4
Truck-5
Raw Data
Store
MQTT
to Kafka
Kafka to
Raw
{"truckid":"57","driverid":"15","routeid":"192762466
2","eventtype":"Normal","latitude":"38.65","longitu
de":"-
90.21","correlationId":"4412891759760421296"}
Port: 1883
Port: 1884
Demo: Dataflow for MQTT to Kafka
Demo: MQTT Processor
Demo: Kafka Processor
Demo: Masking Field with ReplaceText Processor
Stream Data Integration:
StreamSets DataCollector
StreamSets Data Collector
• Founded by ex-Cloudera, Informatica
employees
• Continuous open source, intent-driven, big data
ingest
• Visible, record-oriented approach fixes
combinatorial explosion
• Batch or stream processing
• Standalone, Spark cluster, MapReduce cluster
• IDE for pipeline development by ‘civilians’
• Relatively new - first public release September
2015
• So far, vast majority of commits are from
StreamSets staff
StreamSets Origins
Source: https://streamsets.com/connectors
An origin stage represents the
source for the pipeline. You can
use a single origin stage in a
pipeline
Origins on the right are available
out of the box
API for writing custom origins
StreamSets Processors
A processor stage represents a type of
data processing that you want to perform
use as many processors in a pipeline as
you need
Programming languages supported
• Java
• JavaScript
• Jython
• Groovy
• Java Expression Language (EL) Spark
Some of processors available out-of-the-
box:
• Expression Evaluator
• Field Flattener
• Field Hasher
• Field Masker
• Field Merger
• Field Order
• Field Splitter
• Field Zip
• Groovy Evaluator
• JDBC Lookup
• JSON Parser
• Spark Evaluator
• …
StreamSets Destinations
A destination stage represents
the target for a pipeline. You can
use one or more destinations in a
pipeline
Destinations on the right are
available out of the box
API for writing custom origins
Source: https://streamsets.com/connectors
Demo Case
Truck-2
truck/nn/
position
Truck-1
Truck-3
truck
position raw
truck/nn/
positionTruck-4
Truck-5
Raw Data
Store
MQTT-1
to Kafka
Kafka to
Raw
{"truckid":"57","driverid":"15","routeid":"192762466
2","eventtype":"Normal","latitude":"38.65","longitu
de":"-
90.21","correlationId":"4412891759760421296"}
MQTT-2
to Kafka
Edge
Port: 1883
Port: 1884
Demo: Dataflow for MQTT to Kafka
Demo: MQTT Source
Demo: Kafka Sink
Demo: Dataflow for MQTT to Kafka
Demo: Masking fields
Demo: Sending Message to Kafka in Avro
StreamSets Dataflow Performance Manager
• Map dataflows to topologies, manage releases &
track changes
• Measure KPIs and establish baselines for data
availability and accuracy
• Master dataflow operations through Data SLAs
Source: https://streamsets.com/connectors
Stream Data Integration: Kafka
Connect
Kafka Connect - Overview
Source
Connecto
r
Sink
Connecto
r
Introduction to Stream Processing
Kafka Connect – Single Message Transforms (SMT)
Simple Transformations for a single message
Defined as part of Kafka Connect
• some useful transforms provided out-of-the-box
• Easily implement your own
Optionally deploy 1+ transforms with each
connector
• Modify messages produced by source
connector
• Modify messages sent to sink connectors
Makes it much easier to mix and match connectors
Some of currently available
transforms:
• InsertField
• ReplaceField
• MaskField
• ValueToKey
• ExtractField
• TimestampRouter
• RegexRouter
• SetSchemaMetaData
• Flatten
• TimestampConverter
Kafka Connect – Many Connectors
60+ since first release (0.9+)
20+ from Confluent and Partners
Source: http://www.confluent.io/product/connectors
Confluent supported Connectors
Certified Connectors Community Connectors
Demo Case
Truck-2
truck/nn/
position
Truck-1
Truck-3
truck
position raw
truck/nn/
positionTruck-4
Truck-5
Raw Data
Store
MQTT-1
to Kafka
Kafka to
Raw
{"truckid":"57","driverid":"15","routeid":"192762466
2","eventtype":"Normal","latitude":"38.65","longitu
de":"-
90.21","correlationId":"4412891759760421296"}
MQTT-2
to Kafka
Port: 1883
Port: 1884
Demo (II) – devices send to MQTT instead of Kafka
#!/bin/bash
curl -X "POST" "http://192.168.69.138:8083/connectors" 
-H "Content-Type: application/json" 
-d $'{
"name": "mqtt-source",
"config": {
"connector.class": "io.confluent.connect.mqtt.MqttSourceConnector",
"tasks.max": "1",
"name": "mqtt-source",
"mqtt.server.uri": "tcp://mosquitto:1883",
"mqtt.topics": "truck/+/position",
"kafka.topic":"truck_position",
"mqtt.clean.session.enabled":"true",
"mqtt.connect.timeout.seconds":"30",
"mqtt.keepalive.interval.seconds":"60",
"mqtt.qos":"0"
}
}'
Summary
Summary
Apache NiFi
• visual dataflow modelling
• very powerful – “with power
comes responsibility”
• special package for Edge
computing
• data lineage and data
provenance
• supports for backpressure
• no transport mechanism
(DEV/TST/PROD)
• custom processors
• supported by Hortonworks
StreamSets
• visual dataflow modelling
• very powerful – “with power
comes responsibility”
• special package for Edge
computing
• data lineage and data
provenance
• no transport mechanism
• custom sources, sinks,
processors
• supported by StreamSets
Kafka Connect
• declarative style data flows
• simplicity - “simple things
done simple”
• very well integrated with
Kafka – comes with Kafka
• Single Message Transforms
(SMT)
• use Kafka Streams for
complex data flows
• custom connectors
• supported by Confluent
Technology on its own won't help you.
You need to know how to use it properly.

Más contenido relacionado

La actualidad más candente

Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks FundamentalsDalibor Wijas
 
ETL Made Easy with Azure Data Factory and Azure Databricks
ETL Made Easy with Azure Data Factory and Azure DatabricksETL Made Easy with Azure Data Factory and Azure Databricks
ETL Made Easy with Azure Data Factory and Azure DatabricksDatabricks
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
Modularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache SparkModularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache SparkDatabricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture DesignKujambu Murugesan
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 
Introduction to Azure
Introduction to AzureIntroduction to Azure
Introduction to AzureRobert Crane
 
Adopting Multi-Cloud Services with Confidence
Adopting Multi-Cloud Services with ConfidenceAdopting Multi-Cloud Services with Confidence
Adopting Multi-Cloud Services with ConfidenceKevin Hakanson
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Databricks
 
Data Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryData Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryMark Kromer
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design PatternsJohn Yeung
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data FabricAlan McSweeney
 

La actualidad más candente (20)

Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWSBuilding-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
 
ETL Made Easy with Azure Data Factory and Azure Databricks
ETL Made Easy with Azure Data Factory and Azure DatabricksETL Made Easy with Azure Data Factory and Azure Databricks
ETL Made Easy with Azure Data Factory and Azure Databricks
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Modularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache SparkModularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache Spark
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Introduction to Azure
Introduction to AzureIntroduction to Azure
Introduction to Azure
 
Adopting Multi-Cloud Services with Confidence
Adopting Multi-Cloud Services with ConfidenceAdopting Multi-Cloud Services with Confidence
Adopting Multi-Cloud Services with Confidence
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
 
AWS 101
AWS 101AWS 101
AWS 101
 
Data Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryData Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data Factory
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data Fabric
 
Microsoft Azure Overview
Microsoft Azure OverviewMicrosoft Azure Overview
Microsoft Azure Overview
 

Similar a Streaming Data Integration Tools

Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksGuido Schmutz
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Building event-driven (Micro)Services with Apache Kafka
Building event-driven (Micro)Services with Apache KafkaBuilding event-driven (Micro)Services with Apache Kafka
Building event-driven (Micro)Services with Apache KafkaGuido Schmutz
 
Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data SolutionsGuido Schmutz
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
IoT Architecture - Are Traditional Architectures Good Enough or do we Need Ne...
IoT Architecture - Are Traditional Architectures Good Enough or do we Need Ne...IoT Architecture - Are Traditional Architectures Good Enough or do we Need Ne...
IoT Architecture - Are Traditional Architectures Good Enough or do we Need Ne...Guido Schmutz
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureGuido Schmutz
 
Building Event-Driven (Micro)Services with Apache Kafka
Building Event-Driven (Micro)Services with Apache KafkaBuilding Event-Driven (Micro)Services with Apache Kafka
Building Event-Driven (Micro)Services with Apache KafkaGuido Schmutz
 
Architektur von Big Data Lösungen
Architektur von Big Data LösungenArchitektur von Big Data Lösungen
Architektur von Big Data LösungenGuido Schmutz
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Guido Schmutz
 
Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?Guido Schmutz
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
IoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoTIoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoTJames Chittenden
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesJames Serra
 
Oracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream ProcessingOracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream ProcessingGuido Schmutz
 

Similar a Streaming Data Integration Tools (20)

Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and Frameworks
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Building event-driven (Micro)Services with Apache Kafka
Building event-driven (Micro)Services with Apache KafkaBuilding event-driven (Micro)Services with Apache Kafka
Building event-driven (Micro)Services with Apache Kafka
 
Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data Solutions
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
IoT Architecture - Are Traditional Architectures Good Enough or do we Need Ne...
IoT Architecture - Are Traditional Architectures Good Enough or do we Need Ne...IoT Architecture - Are Traditional Architectures Good Enough or do we Need Ne...
IoT Architecture - Are Traditional Architectures Good Enough or do we Need Ne...
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI Architecture
 
Building Event-Driven (Micro)Services with Apache Kafka
Building Event-Driven (Micro)Services with Apache KafkaBuilding Event-Driven (Micro)Services with Apache Kafka
Building Event-Driven (Micro)Services with Apache Kafka
 
Architektur von Big Data Lösungen
Architektur von Big Data LösungenArchitektur von Big Data Lösungen
Architektur von Big Data Lösungen
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
 
Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
IoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoTIoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoT
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Internet of Things and Big Data
Internet of Things and Big DataInternet of Things and Big Data
Internet of Things and Big Data
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
Oracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream ProcessingOracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream Processing
 

Más de Guido Schmutz

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as CodeGuido Schmutz
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureGuido Schmutz
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsGuido Schmutz
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!Guido Schmutz
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Guido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureGuido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureGuido Schmutz
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaGuido Schmutz
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaGuido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaGuido Schmutz
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaGuido Schmutz
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming VisualisationGuido Schmutz
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Guido Schmutz
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaGuido Schmutz
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Guido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 
Location Analytics - Real Time Geofencing using Apache Kafka
Location Analytics - Real Time Geofencing using Apache KafkaLocation Analytics - Real Time Geofencing using Apache Kafka
Location Analytics - Real Time Geofencing using Apache KafkaGuido Schmutz
 

Más de Guido Schmutz (20)

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data Architecture
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache Kafka
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using Kafka
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Location Analytics - Real Time Geofencing using Apache Kafka
Location Analytics - Real Time Geofencing using Apache KafkaLocation Analytics - Real Time Geofencing using Apache Kafka
Location Analytics - Real Time Geofencing using Apache Kafka
 

Último

RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 

Último (20)

RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 

Streaming Data Integration Tools

  • 1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH Streaming Data Ingestion in BigData- und IoT-Anwendungen Guido Schmutz – 27.9.2018 @gschmutz guidoschmutz.wordpress.com
  • 2. Guido Schmutz Working at Trivadis for more than 21 years Oracle ACE Director for Fusion Middleware and SOA Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Head of Trivadis Architecture Board Technology Manager @ Trivadis More than 30 years of software development experience Contact: guido.schmutz@trivadis.com Blog: http://guidoschmutz.wordpress.com Slideshare: http://www.slideshare.net/gschmutz Twitter: gschmutz
  • 3. COPENHAGEN MUNICH LAUSANNE BERN ZURICH BRUGG GENEVA HAMBURG DÜSSELDORF FRANKFURT STUTTGART FREIBURG BASEL VIENNA With over 600 specialists and IT experts in your region. 14 Trivadis branches and more than 600 employees 200 Service Level Agreements Over 4,000 training participants Research and development budget: CHF 5.0 million Financially self-supporting and sustainably profitable Experience from more than 1,900 projects per year at over 800 customers
  • 4. Agenda 1. Big Data and IoT Reference Architecture 2. Event Hub 3. Stream Data Integration Apache NiFi StreamSets Data Collector Kafka Connect 4. Summary
  • 5. Big Data and IoT Reference Architecture
  • 6. Bulk Source Hadoop Clusterd Hadoop Cluster Big Data Platform BI Tools Enterprise Data Warehouse SQL Search / Explore Parallel Processing Storage Storage RawRefined Results high latency Enterprise Apps Logic { } API File Import / SQL Import DB Extract File DB Big Data solves Volume and Variety – not Velocity Introduction to Stream Processing
  • 7. Bulk Source Hadoop Clusterd Hadoop Cluster Big Data Platform BI Tools Enterprise Data Warehouse SQL Search / Explore Parallel Processing Storage Storage RawRefined Results high latency Enterprise Apps Logic { } API File Import / SQL Import DB Extract File DB Event Source Location Telemetry IoT Data Mobile Apps Social Big Data solves Volume and Variety – not Velocity Introduction to Stream Processing Event Stream
  • 8. Bulk Source Hadoop Clusterd Hadoop Cluster Big Data Platform BI Tools Enterprise Data Warehouse SQL Search / Explore • Machine Learning • Graph Algorithms • Natural Language Processing Parallel Processing Storage Storage RawRefined Results high latency Enterprise Apps Logic { } API File Import / SQL Import DB Extract File DB Event Stream Event Source Location IoT Data Mobile Apps Social Big Data solves Volume and Variety – not Velocity Introduction to Stream Processing Event Hub Event Hub Event Hub Telemetry
  • 9. Event Hub Event Hub Hadoop Clusterd Hadoop Cluster Stream Analytics Platform Stream Processing Architecture solves Velocity BI Tools Enterprise Data Warehouse Event Hub Search / Explore Enterprise Apps Search Results Stream Analytics Reference / Models Dashboard Logic { } API Event Stream Event Stream Event Stream Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Introduction to Stream Processing Low(est) latency, no history Telemetry
  • 10. Hadoop Clusterd Hadoop Cluster Stream Analytics Platform Big Data for all historical data analysis BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Search Results Stream Analytics Reference / Models Dashboard Logic { } API Event Stream Event Stream Hadoop Clusterd Hadoop Cluster Big Data Platform Parallel Processing Storage Storage RawRefined Results Data FlowEvent Hub Event Stream Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social File Import / SQL Import Introduction to Stream Processing Telemetry
  • 11. Data Store Integrate existing systems through CDC Data Event Hub Integration Consuming Systems StateLogic CDC CDC Connector Traditional Silo-based System LogicUser Interface Capture changes directly on database Change Data Capture (CDC) => think like a global database trigger Transform existing systems to event producer Event Stream Event Stream Introduction to Stream Processing
  • 12. Hadoop Clusterd Hadoop Cluster Stream Analytics Platform Integrate existing systems with lower latency through CDC BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Search Results Stream Analytics Reference / Models Dashboard Logic { } API Hadoop Clusterd Hadoop Cluster Big Data Platform Parallel Processing Storage Storage RawRefined Results File Import / SQL Import Event Stream Event Stream Data FlowEvent Hub Event Stream Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Introduction to Stream Processing Telemetry
  • 13. New systems participate in event-oriented fashion Hadoop Clusterd Hadoop Cluster Big Data Platform Parallel Processing Storage Storage RawRefined Results Microservice Platform Microservice State { } API Stream Analytics Platform Stream Processor State { } API Event Stream SQL Search BI Tools Enterprise Data Warehouse Search / Explore Service Enterprise Apps Logic { } API File Import / SQL Import Event Stream Data FlowEvent Hub Event Stream Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Event Stream Event Stream Introduction to Stream Processing Telemetry
  • 14. Edge computing allows processing close to data sources Hadoop Clusterd Hadoop Cluster Big Data Platform Parallel Processing Storage Storage RawRefined Results Microservice Platfrom Microservice State { } API Stream Analytics Platform Stream Processor State { } API SQL Search BI Tools Enterprise Data Warehouse Search / Explore Service Enterprise Apps Logic { } API Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Edge Node File Import / SQL Import Event Hub Event Stream Event Stream Event Stream Introduction to Stream Processing Telemetry Rules Event Hub Storage
  • 15. Hadoop Clusterd Hadoop Cluster Big Data Unified Architecture for Modern Data Analytics Solutions SQL Search BI Tools Enterprise Data Warehouse Search / Explore File Import / SQL Import Event Hub Parallel Processing Storage Storage RawRefined Results Microservice State { } API Stream Processor State { } API Event Stream Event Stream Service Stream Analytics Microservices Enterprise Apps Logic { } API Edge Node Rules Event Hub Storage Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Event Stream Telemetry
  • 16. Two Types of Stream Processing (from Gartner) Introduction to Stream Processing Stream Data Integration • primarily focuses on the ingestion and processing of data sources targeting real- time extract-transform-load (ETL) and data integration use cases • filter and enrich the data • optionally calculate time-windowed aggregations before storing the results in a database or file system Stream Analytics • targets analytics use cases • calculating aggregates and detecting patterns to generate higher-level, more relevant summary information (complex events) • Complex events may signify threats or opportunities that require a response from the business through real-time dashboards, alerts or decision automation
  • 18. Implementing "Event Hub" Hadoop Clusterd Hadoop Cluster Cluster Infrastructure Parallel Processing Storage Storage RawRefined Results Microservice State { } API Stream Processor State { } API SQL Search BI Tools Enterprise Data Warehouse Search / Explore Service Enterprise Apps Logic { } API Bulk Source Event Source Location DB Extract File Weather DB IoT Data Mobile Apps Social Edge Node Rules Event Hub Storage File Import / SQL Import Event Hub Event Stream Event Stream Event Stream Replay Big Data Batch Analytics Stream Analytics Modern Applications Introduction to Stream Processing
  • 19. Apache Kafka – A Streaming Platform Introduction to Stream Processing High-Level Architecture Distributed Log at the Core Scale-Out Architecture Logs do not (necessarily) forget
  • 20. Hold Data for Long-Term – Data Retention Producer 1 Broker 1 Broker 2 Broker 3 1. Never 2. Time based (TTL) log.retention.{ms | minutes | hours} 3. Size based log.retention.bytes 4. Log compaction based (entries with same key are removed): kafka-topics.sh --zookeeper zk:2181 --create --topic customers --replication-factor 1 --partitions 1 --config cleanup.policy=compact Introduction to Stream Processing
  • 21. Keep Topics in Compacted Form 0 1 2 3 4 5 6 7 8 9 10 11 K1 K2 K1 K1 K3 K2 K4 K5 K5 K2 K6 K2 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 Offset Key Value 3 4 6 8 9 10 K1 K3 K4 K5 K2 K6 V4 V5 V7 V9 V10 V11 Offset Key Value Compaction Introduction to Stream Processing V1 V2 V3 V4 V5 V6 V7 V8 V9 V1 0 V1 1 K1 K3 K4 K5 K2 K6
  • 23. Implementing "Stream Data Integration" Hadoop Clusterd Hadoop Cluster Cluster Infrastructure Parallel Processing Storage Storage RawRefined Results Microservice State { } API Stream Processor State { } API SQL Search BI Tools Enterprise Data Warehouse Search / Explore Service Enterprise Apps Logic { } API Bulk Source Event Source Location DB Extract File Weather DB IoT Data Mobile Apps Social Edge Node Rules Event Hub Storage File Import / SQL Import Event Hub Event Stream Event Stream Event Stream Replay Big Data Batch Analytics Stream Analytics Modern Applications Introduction to Stream Processing
  • 24. Integrating (Streaming) Data Sources Introduction to Stream Processing SQL Polling Change Data Capture (CDC) File Polling File Stream (File Tailing) File Stream (Appender) Sensor Stream
  • 25. IoT GW MQTT Broker IoT devices will often not be able to talk to Kafka directly DB Source Big Data Lo g Stream Processing IoT Sensor Event Hub Topic Topic REST Topic IoT GW CDC GW Connect CDC DB Source Lo g CDC Native IoT Sensor IoT Sensor 26 Dataflow GW Topic Topic Queue Messaging GWTopic Dataflow GWDataflow Topic REST 26 File Source Lo g Lo g Lo g Social Native
  • 26. Why is Data Ingestion Difficult? Physical and Logical Infrastructure changes rapidly Key Challenges: Infrastructure Automation Edge Deployment Infrastructure Drift Data Structures and formats evolve and change unexpectedly Key Challenges: Consumption Readiness Corruption and Loss Structure Drift Data semantics change with evolving applications Key Challenges Timely Intervention System Consistency Semantic Drift Source: Streamsets
  • 27. Integration with or without Transformation? Zero Transformation • No transformation, plain ingest, no schema validation • Keep the original format – Text, CSV, … • Allows to store data that may have errors in the schema Format Transformation • Prefer name of Format Translation • Simply change the format • Change format from Text to Avro • Does schema validation Enrichment Transformation • Add new data to the message • Do not change existing values • Convert a value from one system to another and add it to the message Value Transformation • Replaces values in the message • Convert a value from one system to another and change the value in-place • Destroys the raw data! Introduction to Stream Processing d
  • 28. Demo Case Truck-2 truck/nn/ position Truck-1 Truck-3 truck position raw? truck/nn/ positionTruck-4 Truck-5 Raw Data Store ? {"truckid":"57","driverid":"15","routeid":"192762466 2","eventtype":"Normal","latitude":"38.65","longitu de":"- 90.21","correlationId":"4412891759760421296"}
  • 30. Apache NiFi • Originated at NSA as Niagarafiles – developed behind closed doors for 8 years • Open sourced December 2014, Apache Top Level Project July 2015 • Look-and-Feel modernized in 2016 • Opaque, “file-oriented” payload • Distributed system of processors with centralized control • Based on flow-based programming concepts • Data Provenance and Data Lineage • Web-based user interface
  • 31. Processors for Source and Sink • ConsumeXXXX (AMQP, EWS, IMAP, JMS, Kafka, MQTT, POP3, …) • DeleteXXXX (DynamoDB, Elasticsearch, HDFS, RethinkDB, S3, SQS, ...) • FetchXXXX (AzureBlobStorage, ElasticSearch, File, FTP, HBase, HDFS, S3 ...) • ExecuteXXXX (FlumeSink, FlumeSource, Script, SQL, ...) • GetXXXX (AzureEventHub, Couchbase, DynamoDB, File, FTP, HBase, HDFS, HTTP, Ignite, JMSQueue, JMSTopic, Kafka, Mongo, Solr, Splunk, SQS, TCP, ...) • ListenXXXX (HTTP, RELP, SMTP, Syslog, TCP, UDP, WebSocket, ...) • PublishXXXX (Kafka, MQTT) • PutXXXX (AzureBlobStorage, AzureEventHub, CassandraQL, CloudWatchMetric, Couchbase, DynamoDB, Elasticsearch, Email, FTP, File, Hbase, HDFS, HiveQL, Kudu, Lambda, Mongo, Parquet, Slack, SQL, TCP, ....) • QueryXXXX (Cassandra, DatabaseTable, DNS, Elasticserach)
  • 32. Processors for Processing • ConvertXxxxToYyyy • ConvertRecord • EnforceOrder • EncryptContent • ExtractXXXX (AvroMetdata, EmailAttachments, Grok, HL7Attributes, ImageMetadata, ...) • GeoEnrichIP • JoltTransformJSON • MergeContent • ReplaceText • ResizeImage • SplitXXXX (Avro, Content, JSON, Record, Xml, ...) • TailFile • TransformXML • UpdateAttribute
  • 33. Demo Case Truck-2 truck/nn/ position Truck-1 Truck-3 truck position raw truck/nn/ positionTruck-4 Truck-5 Raw Data Store MQTT to Kafka Kafka to Raw {"truckid":"57","driverid":"15","routeid":"192762466 2","eventtype":"Normal","latitude":"38.65","longitu de":"- 90.21","correlationId":"4412891759760421296"} Port: 1883 Port: 1884
  • 34. Demo: Dataflow for MQTT to Kafka
  • 37. Demo: Masking Field with ReplaceText Processor
  • 39. StreamSets Data Collector • Founded by ex-Cloudera, Informatica employees • Continuous open source, intent-driven, big data ingest • Visible, record-oriented approach fixes combinatorial explosion • Batch or stream processing • Standalone, Spark cluster, MapReduce cluster • IDE for pipeline development by ‘civilians’ • Relatively new - first public release September 2015 • So far, vast majority of commits are from StreamSets staff
  • 40. StreamSets Origins Source: https://streamsets.com/connectors An origin stage represents the source for the pipeline. You can use a single origin stage in a pipeline Origins on the right are available out of the box API for writing custom origins
  • 41. StreamSets Processors A processor stage represents a type of data processing that you want to perform use as many processors in a pipeline as you need Programming languages supported • Java • JavaScript • Jython • Groovy • Java Expression Language (EL) Spark Some of processors available out-of-the- box: • Expression Evaluator • Field Flattener • Field Hasher • Field Masker • Field Merger • Field Order • Field Splitter • Field Zip • Groovy Evaluator • JDBC Lookup • JSON Parser • Spark Evaluator • …
  • 42. StreamSets Destinations A destination stage represents the target for a pipeline. You can use one or more destinations in a pipeline Destinations on the right are available out of the box API for writing custom origins Source: https://streamsets.com/connectors
  • 43. Demo Case Truck-2 truck/nn/ position Truck-1 Truck-3 truck position raw truck/nn/ positionTruck-4 Truck-5 Raw Data Store MQTT-1 to Kafka Kafka to Raw {"truckid":"57","driverid":"15","routeid":"192762466 2","eventtype":"Normal","latitude":"38.65","longitu de":"- 90.21","correlationId":"4412891759760421296"} MQTT-2 to Kafka Edge Port: 1883 Port: 1884
  • 44. Demo: Dataflow for MQTT to Kafka
  • 47. Demo: Dataflow for MQTT to Kafka
  • 49. Demo: Sending Message to Kafka in Avro
  • 50. StreamSets Dataflow Performance Manager • Map dataflows to topologies, manage releases & track changes • Measure KPIs and establish baselines for data availability and accuracy • Master dataflow operations through Data SLAs Source: https://streamsets.com/connectors
  • 51. Stream Data Integration: Kafka Connect
  • 52. Kafka Connect - Overview Source Connecto r Sink Connecto r Introduction to Stream Processing
  • 53. Kafka Connect – Single Message Transforms (SMT) Simple Transformations for a single message Defined as part of Kafka Connect • some useful transforms provided out-of-the-box • Easily implement your own Optionally deploy 1+ transforms with each connector • Modify messages produced by source connector • Modify messages sent to sink connectors Makes it much easier to mix and match connectors Some of currently available transforms: • InsertField • ReplaceField • MaskField • ValueToKey • ExtractField • TimestampRouter • RegexRouter • SetSchemaMetaData • Flatten • TimestampConverter
  • 54. Kafka Connect – Many Connectors 60+ since first release (0.9+) 20+ from Confluent and Partners Source: http://www.confluent.io/product/connectors Confluent supported Connectors Certified Connectors Community Connectors
  • 55. Demo Case Truck-2 truck/nn/ position Truck-1 Truck-3 truck position raw truck/nn/ positionTruck-4 Truck-5 Raw Data Store MQTT-1 to Kafka Kafka to Raw {"truckid":"57","driverid":"15","routeid":"192762466 2","eventtype":"Normal","latitude":"38.65","longitu de":"- 90.21","correlationId":"4412891759760421296"} MQTT-2 to Kafka Port: 1883 Port: 1884
  • 56. Demo (II) – devices send to MQTT instead of Kafka #!/bin/bash curl -X "POST" "http://192.168.69.138:8083/connectors" -H "Content-Type: application/json" -d $'{ "name": "mqtt-source", "config": { "connector.class": "io.confluent.connect.mqtt.MqttSourceConnector", "tasks.max": "1", "name": "mqtt-source", "mqtt.server.uri": "tcp://mosquitto:1883", "mqtt.topics": "truck/+/position", "kafka.topic":"truck_position", "mqtt.clean.session.enabled":"true", "mqtt.connect.timeout.seconds":"30", "mqtt.keepalive.interval.seconds":"60", "mqtt.qos":"0" } }'
  • 58. Summary Apache NiFi • visual dataflow modelling • very powerful – “with power comes responsibility” • special package for Edge computing • data lineage and data provenance • supports for backpressure • no transport mechanism (DEV/TST/PROD) • custom processors • supported by Hortonworks StreamSets • visual dataflow modelling • very powerful – “with power comes responsibility” • special package for Edge computing • data lineage and data provenance • no transport mechanism • custom sources, sinks, processors • supported by StreamSets Kafka Connect • declarative style data flows • simplicity - “simple things done simple” • very well integrated with Kafka – comes with Kafka • Single Message Transforms (SMT) • use Kafka Streams for complex data flows • custom connectors • supported by Confluent
  • 59. Technology on its own won't help you. You need to know how to use it properly.