SlideShare una empresa de Scribd logo
1 de 33
Agenda
Me and DataMountaineer
Data Integration
Kafka Connect
Demo
Andrew Stevenson
Lead Mountain Goat at DM
Solution Architect
- Fast Data
- Big Data as long as it’s Fast
Contributed to
- Kafka Connect, Sqoop, Kite SDK
Kafka Connect Connectors
20+ connectors.
DataStax Certified Cassandra Sink
Professional Services
Implementations, Architecture reviews
DevOps & Tooling
Connector Support
Partners
Data Integration
Loading and unloading data
should be easy
But it takes too long
(certainly on hadoop)
Why?
Enterprise pipelines must consider:
Delivery semantics
Offset management
Serialization / de-serialization
Partitioning / scalability
Fault tolerance / failover
Data model integration
CI/CD
Metrics / monitoring
Which results in?
Multiple technologies
- Bash wrappers on Sqoop
- Oozie Xml 
- Custom Java/Scala/C#
- Third Party
- Multiple teams hand roll similar solutions
Lack of separation of concerns
- Extract/loading ends up domain specific
What we really care
about…
DOMAIN SPECIFIC TRANSFORMATIONS
Focus on adding value
Kafka Connect?
✓Delivery semantics
✓Offset management
✓Serialization / de-serialization
✓Partitioning / scalability
✓Fault tolerance / fail-over
✓Data model integration
✓Metrics
Out of the Box – ONE FRAMEWROK
Lets you focus on domain logic
Kafka Connect
“a common framework facilitating
data streams
between kafka and other systems”
Ease of use 
deploy flows via configuration files
with no code necessary
Out of the box & Community
Configurations
are key-value mappings
name connector’s unique name
connector.class connector’s class
max.tasks maximum tasks to create
Option[topics] list of topics (for sinks)
Config Example
name = kudu-sink
connector.class = KuduSinkConnector
tasks.max = 1
topics = kudu_test
connect.kudu.master = quickstart
connect.kudu.sink.kcql = INSERT INTO KuduTable
SELECT * FROM kudu_test
KCQL
is a SQL like syntax allowing streamlined
configuration of Kafka Sink Connectors and
then some more..
Example:
Project fields, rename or ignore them and further
customise in plain text
INSERT INTO transactions SELECT field1 AS column1, field2 AS column2 FROM TransactionTopic;
INSERT INTO audits SELECT * FROM AuditsTopic;
INSERT INTO logs SELECT * FROM LogsTopic AUTOEVOLVE;
INSERT INTO invoices SELECT * FROM InvoiceTopic PK invoiceID;
KCQL |
{ "sensor_id": "01" , "temperature": 52.7943, "ts": 1484648810 }
{ “sensor_id": "02" , "temperature": 28.8597, "ts": 1484648810 }
INSERT INTO sensor_ringbuffer
SELECT sensor_id, temperature, ts
FROM coap_sensor_topic
WITHFORMAT JSON
STOREAS RING_BUFFER
INSERT INTO sensor_reliabletopic
SELECT sensor_id, temperature, ts
FROM coap_sensor_topic
WITHFORMAT AVRO
STOREAS RELIABLE_TOPIC
Deeper into Connect
Modes
Workers
Connectors
Tasks
Converters
Modes
Standalone
- single node, for testing, one off import/exports
Distributed
- 1 or more workers on 1 or more servers
form a cluster/consumer group
Interaction
Standalone
- properties file at start up
Distributed
- rest API  ./cli
- create  ./cli create conn < conn.conf
- start  ./cli start conn < conn.conf
- stop  ./cli stop conn
- restart  ./cli restart conn
- pause  ./cli pause conn
- remove  ./cli rm conn
- status  ./cli ps
- plugins  ./cli plugins
- validate  ./cli validate < conn.conf
Workers
group.id = cluster1
group.id = cluster1
group.id = cluster1
* Kafka Consumer Group
Kafka
Connectors
Define the how
- which plugin to use, must be on CLASSPATH
- breaks up work into tasks
- multiple per cluster, unique name
- fault tolerant
Happy flow
Rest
API
W1
W2
W3
C1
C1 T1
Config topic
C1 T1
C1 T2
C1 T3
Put
* Coordinator
Unhappy flow
W1
W2
W3
C1
C1 T1
Config topic
C1 T1
C1 T2
C1 T3
* ReBalanceListener
Connector API
class CoherenceSinkConnector extends SinkConnector {
override def taskClass(): Class[_ <: Task]
override def start(props: util.Map[String, String]): Unit
override def taskConfigs(maxTasks: Int): util.List[util.Map[String, String]]
override def stop(): Unit
}
Tasks
Perform the actual work
- loading / unloading
- single threaded
Used to Scale
- more task more parallelism
- kafka consumer group managed
- if (tasks > partitions) => idle tasks
Contains no state, it’s in Kafka
- started
- stopped
- restarted
- pause
Task API
class CoherenceSinkTask extends SinkTask {
override def start(props: util.Map[String, String]): Unit
override def stop(): Unit
override def flush(offsets: util.Map[TopicPartition,
OffsetAndMetadata])
override def put(records: util.Collection[SinkRecord])
}
Converters
JsonConverter
- ships with Kafka
AvroConverter
- ships with Confluent
- integrates with Schema Registry
kafka-connect-blockchain
kafka-connect-bloomberg
kafka-connect-cassandra
kafka-connect-coap
kafka-connect-druid
kafka-connect-elastic
kafka-connect-ftp
kafka-connect-hazelcast
kafka-connect-hbase
kafka-connect-influxdb
kafka-connect-jms
kafka-connect-kudu
kafka-connect-mongodb
kafka-connect-mqtt
kafka-connect-redis
kafka-connect-rethink
kafka-connect-voltdb
kafka-connect-yahoo
Source: https://github.com/datamountaineer/stream-reactor
Integration Tests: http://coyote.landoop.com/connect/
Connect UI
Schema UI
Topic UI
Dockers
Cloudera CSD
Kafka Connect
Scalable
Fault tolerant
Common framework
Does the hard for work you
DEMO

Más contenido relacionado

La actualidad más candente

Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)
Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)
Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)
Keigo Suda
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
Guido Schmutz
 

La actualidad más candente (20)

Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
 
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
 
Data integration with Apache Kafka
Data integration with Apache KafkaData integration with Apache Kafka
Data integration with Apache Kafka
 
Introduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridIntroduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - Madrid
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Kafka connect 101
Kafka connect 101Kafka connect 101
Kafka connect 101
 
Confluent Developer Training
Confluent Developer TrainingConfluent Developer Training
Confluent Developer Training
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Kafka Connect
Kafka ConnectKafka Connect
Kafka Connect
 
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
 
Confluent building a real-time streaming platform using kafka streams and k...
Confluent   building a real-time streaming platform using kafka streams and k...Confluent   building a real-time streaming platform using kafka streams and k...
Confluent building a real-time streaming platform using kafka streams and k...
 
Scaling an Event-Driven Architecture with IBM and Confluent | Antony Amanse a...
Scaling an Event-Driven Architecture with IBM and Confluent | Antony Amanse a...Scaling an Event-Driven Architecture with IBM and Confluent | Antony Amanse a...
Scaling an Event-Driven Architecture with IBM and Confluent | Antony Amanse a...
 
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
 
Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)
Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)
Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Guaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, Nutanix
Guaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, NutanixGuaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, Nutanix
Guaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, Nutanix
 
Changing landscapes in data integration - Kafka Connect for near real-time da...
Changing landscapes in data integration - Kafka Connect for near real-time da...Changing landscapes in data integration - Kafka Connect for near real-time da...
Changing landscapes in data integration - Kafka Connect for near real-time da...
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
 

Destacado

Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, SparkBuilding Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Todd Fritz
 
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Spark Summit
 

Destacado (20)

Processing IoT Data with Apache Kafka
Processing IoT Data with Apache KafkaProcessing IoT Data with Apache Kafka
Processing IoT Data with Apache Kafka
 
Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017
 
Getting started with Azure Event Hubs and Stream Analytics services
Getting started with Azure Event Hubs and Stream Analytics servicesGetting started with Azure Event Hubs and Stream Analytics services
Getting started with Azure Event Hubs and Stream Analytics services
 
Blr hadoop meetup
Blr hadoop meetupBlr hadoop meetup
Blr hadoop meetup
 
Storm over gearpump
Storm over gearpumpStorm over gearpump
Storm over gearpump
 
London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)London Apache Kafka Meetup (Jan 2017)
London Apache Kafka Meetup (Jan 2017)
 
Not Only Streams for Akademia JLabs
Not Only Streams for Akademia JLabsNot Only Streams for Akademia JLabs
Not Only Streams for Akademia JLabs
 
IoT Connected Brewery
IoT Connected BreweryIoT Connected Brewery
IoT Connected Brewery
 
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platform
 
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, SparkBuilding Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
 
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
 
IoT Innovation Lab Berlin @relayr - Kay Lerch on Getting basics right for you...
IoT Innovation Lab Berlin @relayr - Kay Lerch on Getting basics right for you...IoT Innovation Lab Berlin @relayr - Kay Lerch on Getting basics right for you...
IoT Innovation Lab Berlin @relayr - Kay Lerch on Getting basics right for you...
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
 
Comparison of various streaming technologies
Comparison of various streaming technologiesComparison of various streaming technologies
Comparison of various streaming technologies
 
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
 
Real-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS AzureReal-Time Event & Stream Processing on MS Azure
Real-Time Event & Stream Processing on MS Azure
 
Reactive integrations with Akka Streams
Reactive integrations with Akka StreamsReactive integrations with Akka Streams
Reactive integrations with Akka Streams
 

Similar a Kafka connect

Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesApache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
DataWorks Summit
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
Wei Ting Chen
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value Store
Santal Li
 
Distribute key value_store
Distribute key value_storeDistribute key value_store
Distribute key value_store
drewz lin
 
LinuxCon 2013 Steven Dake on Using Heat for autoscaling OpenShift on Openstack
LinuxCon 2013 Steven Dake on Using Heat for autoscaling OpenShift on OpenstackLinuxCon 2013 Steven Dake on Using Heat for autoscaling OpenShift on Openstack
LinuxCon 2013 Steven Dake on Using Heat for autoscaling OpenShift on Openstack
OpenShift Origin
 

Similar a Kafka connect (20)

AutoScaling and Drupal
AutoScaling and DrupalAutoScaling and Drupal
AutoScaling and Drupal
 
How to Treat a Network Like a Container (Or Get Close)
How to Treat a Network Like a Container (Or Get Close)How to Treat a Network Like a Container (Or Get Close)
How to Treat a Network Like a Container (Or Get Close)
 
All Things Open 2017: How to Treat a Network as a Container
All Things Open 2017: How to Treat a Network as a ContainerAll Things Open 2017: How to Treat a Network as a Container
All Things Open 2017: How to Treat a Network as a Container
 
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
 
20171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v0120171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v01
 
DevOpsDaysPhoenix - CI/CD Pipelines for CDN
DevOpsDaysPhoenix -  CI/CD Pipelines for CDNDevOpsDaysPhoenix -  CI/CD Pipelines for CDN
DevOpsDaysPhoenix - CI/CD Pipelines for CDN
 
Architecting Applications
Architecting ApplicationsArchitecting Applications
Architecting Applications
 
Modern CI/CD in the microservices world with Kubernetes
Modern CI/CD in the microservices world with KubernetesModern CI/CD in the microservices world with Kubernetes
Modern CI/CD in the microservices world with Kubernetes
 
20141021 AWS Cloud Taekwon - Startup Best Practices on AWS
20141021 AWS Cloud Taekwon - Startup Best Practices on AWS20141021 AWS Cloud Taekwon - Startup Best Practices on AWS
20141021 AWS Cloud Taekwon - Startup Best Practices on AWS
 
MidSem
MidSemMidSem
MidSem
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
 
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
 
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesApache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
Bootstrapping - Session 1 - Your First Week with Amazon EC2
Bootstrapping - Session 1 - Your First Week with Amazon EC2Bootstrapping - Session 1 - Your First Week with Amazon EC2
Bootstrapping - Session 1 - Your First Week with Amazon EC2
 
Discovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clustersDiscovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clusters
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value Store
 
Distribute key value_store
Distribute key value_storeDistribute key value_store
Distribute key value_store
 
LinuxCon 2013 Steven Dake on Using Heat for autoscaling OpenShift on Openstack
LinuxCon 2013 Steven Dake on Using Heat for autoscaling OpenShift on OpenstackLinuxCon 2013 Steven Dake on Using Heat for autoscaling OpenShift on Openstack
LinuxCon 2013 Steven Dake on Using Heat for autoscaling OpenShift on Openstack
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 

Kafka connect

Notas del editor

  1. First me, hurray, then DataMountaineer, who we are and want we do, then Connect, why, what is it, how does it work. Hopefully time for a demo
  2. Big Data background, Clearstream, HFTs, Investment banks. We were Big Data experts but saw first hand Big data’s no good if it’s too slow. Big data is no good if its slow. Anyone know Knight Capital? Lost $440 million in 2008 in 30 minutes, that's 172,000 per second. Hourly or EOD risk reporting is no good, too late your dead. Put another way would cross the road outside with information that's 10 minutes old?
  3. Everybody will have been involved in data integration, data is it new oil. It should be easy....right? Just copy the data from that database and put it over here.
  4. We need to think about these items even if we are already using Kafka consumers and producers. Or we should be.
  5. All results in wasted time and resources. If your a CTO of a large organization this will make your toes curl. No common framework. Team A puts logic in his extract component, means it's not reusable. Team B re implements.
  6. Kafka connect, scalable and fault tolerant, addresses developers needs. One stack, all part of Kafka.
  7. Ingest data, unload data, buffer and persist data, process data all from one distribution.
  8. SAT4s is an example, configured 4 second windfarm signal flow from a SQL Server database to HDFS and Cassandra, no coding.
  9. Hazelcast requires that data is serializable, JSON and Avro are supported.
  10. Highest level component as a developer you deal with. You can have multiple instances of connectors per cluster, must have a unique name
  11. Config topic is a backing topic of each connect cluster. 1 Partition, guarantees order of configuration updates. Starts up you connector, asks it for task configurations assigns tasks to members of the cluster/consumer group. How does it do this? It implements Kafka Abstract Coordinator.
  12. Kafka notifies the Herder that a rebalance is happening, the framework, rebalances the partitions or tasks across the remaining consumer group or connect cluster.
  13. Basically a Consumer group behind the scenes, single threaded, so just like a regular if you have more tasks than partitions in you topics they'll be idle. No state, so good for restful, micro service architectures. it's in kafka.
  14. SinkRecords – is a wrapper, contains topic information, key and value schemas and payloads. Connect is configured with Converters that serialized/deserialize back and forth between the data in Kafka to Connect.
  15. You could for example write your own Protobuf converter and plug that.