SlideShare una empresa de Scribd logo
1 de 77
Descargar para leer sin conexión
Streaming Processing with a Distributed Commit Log
Apache Kafka
2
Apache Kafka committer and PMC member. A frequent
speaker on both Hadoop and Cassandra, Joe is the Co-
Founder and CTO of Elodina Inc. Joe has been a distributed
systems developer and architect for over {years} now having
built backend systems that supported over one hundred
million unique devices a day processing trillions of events. He
blogs and hosts a podcast about Hadoop and related systems
at All Things Hadoop.
@allthingshadoop
$(whoami)
3
● Introduction to Apache Kafka
● Brokers “as a Service”
● Producers & Consumers “as a Service”
● More Use Cases for Kafka
Overview
Apache Kafka
5
Apache Kafka was first open sourced by LinkedIn in 2011
Papers
● Building a Replicated Logging System with Apache Kafka http://www.vldb.org/pvldb/vol8/p1654-wang.pdf
● Kafka: A Distributed Messaging System for Log Processing http://research.microsoft.com/en-
us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
● Building LinkedIn’s Real-time Activity Data Pipeline http://sites.computer.org/debull/A12june/pipeline.pdf
● The Log: What Every Software Engineer Should Know About Real-time Data's Unifying Abstraction http://engineering.
linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
http://kafka.apache.org/
Apache Kafka
It often starts with just one data pipeline
Data Pipelines
Data Pipelines
Data Pipelines
Point to Point Data Pipelines are Problematic
Reuse of data pipelines for new providers
Reuse of existing providers for new consumers
Eventually the solution becomes the problem
Decouple Data Pipelines
Decouple Data Pipelines
Decouple Data Pipelines
Topics & Partitions
Log Segments
Read and Write Keys & Values to each partition
Producers
Consumers
Brokers
Kafka Wire Protocol - http://kafka.apache.org/protocol.html
● Preliminaries
○ Network
○ Partitioning and bootstrapping
○ Partitioning Strategies
○ Batching
○ Versioning and Compatibility
● The Protocol
○ Protocol Primitive Types
○ Notes on reading the request format grammars
○ Common Request and Response Structure
○ Message Sets
● Constants
○ Error Codes
○ Api Keys
● The Messages
● Some Common Philosophical Questions
Data Durability
Client Libraries
Community Clients https://cwiki.apache.org/confluence/display/KAFKA/Clients
● Go (aka golang) Pure Go implementation with full protocol support.
Consumer and Producer implementations included, GZIP and Snappy
compression supported.
● Python - Pure Python implementation with full protocol support. Consumer
and Producer implementations included, GZIP and Snappy compression
supported.
● C - High performance C library with full protocol support
● Ruby - Pure Ruby, Consumer and Producer implementations included,
GZIP and Snappy compression supported. Ruby 1.9.3 and up (CI runs MRI
2.
● Clojure - Clojure DSL for the Kafka API
● JavaScript (NodeJS) - NodeJS client in a pure JavaScript implementation
Operationalizing Kafka
https://kafka.apache.org/documentation.html#basic_ops
Basic Kafka Operations
● Adding and removing topics
● Modifying topics
● Graceful shutdown
● Balancing leadership
● Checking consumer position
● Mirroring data between clusters
● Expanding your cluster
● Decommissioning brokers
● Increasing replication factor
Kafka “as a Service”
27
CURRENT STATE OF IMPLEMENTATION
11 STEPS BEFORE ANY BUSINESS VALUE IS CREATED
1 SET UP Instances → AWS / GCE / etc..
2 Repeat above by # of instances
3 SET UP uniformly, harden, secure every machine
4 DOWNLOAD: Apache Kafka
5 LEARN to install, run on multiple nodes / high availability
6 LEARN to run on multiple data centers / multiple racks
7 CONFIGURE nodes, tables specifically by cluster
8 MONITOR performance, isolate bottlenecks
9 OPTIMIZE system / team to hands off through next objective
10 MONITOR for failure and build disaster recovery protocol
11 FAILURE RECOVERY investigation, recovery and spin back up time
10
11
9
8
7
6
5
4
3
2
1
10
11
9
8
7
6
5
4
3
2
1
10
11
9
8
7
6
5
4
3
2
1
10
11
9
8
7
6
5
4
3
2
1
10
11
9
8
7
6
5
4
3
2
1
10
11
9
8
7
6
5
4
3
2
1
AND process
must repeat by #
of instances and
technologies
28
ELODINA AUTOMATES DEPLOYMENT, SCALING AND MAINTENANCE
Reduce steps and learning curve to a THREE stage repeatable process
1 SET UP Instances → AWS / GCE / etc..
2 Repeat above by # of instances
3 SET UP uniformly, harden, secure every machine
4 DOWNLOAD: Apache Kafka
5 LEARN to install, run on multiple nodes / high availability
6 LEARN to run on multiple data centers / multiple racks
7 CONFIGURE nodes, tables specifically by cluster
8 MONITOR performance, isolate bottlenecks
9 OPTIMIZE system / team to hands off through next objective
10 MONITOR for failure and build disaster recovery protocol
11 FAILURE RECOVERY investigation, recovery and spin back up time
Platform modulars allow for
deployment in minutes
DEPLOY
Grid scales automatically with low
latency based on real time traffic
patterns.
SCALE
Single destination to observe and
troubleshoot from CLI, REST API or
GUI
OBSERVE
29
BUILT- IN FRAMEWORKS DIRECTLY IN PLATFORM
Leading technologies deployable across any compute resource
Platform modulars allow for
deployment in minutes
DEPLOY
Grid scales automatically with low
latency based on real time traffic
patterns.
SCALE
Single destination to observe and
troubleshoot from CLI, REST API
or GUI
OBSERVE
ResourcesTechnologies
30
IMMEDIATE OPERATIONAL BENEFITS
Removing Fragmentation with Interoperability
Clearing crowded market decisioning on which software or stack of software to
choose and interoperate with your data center
Immediate Efficiency & Reliability
Operation resources deployed across multiple data centers across multiple regions
streamlined with dynamic compute and automated scheduling capabilities.
Automated Speed and Recovery
Reduce costs and time to market on development cycle time and Automate recovery
from failure and
What is Mesos?
Scheduler
Executors
mesos/kafka
https://github.com/mesos/kafka
Scheduler
● Provides the operational automation for a Kafka Cluster.
● Manages the changes to the broker's configuration.
● Exposes a REST API for the CLI to use or any other client.
● Runs on Marathon for high availability.
● Broker Failure Management “stickiness”
Executor
● The executor interacts with the kafka broker as an
intermediary to the scheduler
Scheduler & Executor
Typical Operations
● Run the scheduler with Docker
● Run the scheduler on Marathon
● Changing the location where data is stored
● Starting 3 brokers
● View broker log
● High Availability Scheduler State
● Failed Broker Recovery
● Passing multiple options
● Broker metrics
● Rolling restart
Navigating Operations
● Adding brokers to the cluster
● Updating broker configurations
● Starting brokers
● Stopping brokers
● Restarting brokers
● Removing brokers
● Retrieving broker log
● Rebalancing brokers in the cluster
● Listing topics
● Adding topic
● Updating topic
Kafka as a Service
-
Kafka Consumers
“as a Service”
http://heronstreaming.io
Topology Master
The Topology Master (TM) manages a topology
throughout its entire lifecycle, from the time it’s
submitted until it’s ultimately killed. When heron
deploys a topology it starts a single TM and
multiple containers. The TM creates an
ephemeral ZooKeeper node to ensure that
there’s only one TM for the topology and that
the TM is easily discoverable by any process in
the topology. The TM also constructs the
physical plan for a topology which it relays to
different components.
Container Each Heron topology consists of multiple containers, each of which
houses multiple Heron Instances, a Stream Manager, and a Metrics Manager.
Containers communicate with the topology’s TM to ensure that the topology forms
a fully connected graph. For an illustration, see the figure in the Topology Master
section above.
Stream Manager
The Stream Manager (SM) manages the
routing of tuples between topology
components. Each Heron Instance in a
topology connects to its local SM, while all of
the SMs in a given topology connect to one
another to form a network. Below is a visual
illustration of a network of SMs:
Heron Instance
A Heron Instance (HI) is a process that handles a single task of a spout or bolt, which allows for easy
debugging and profiling.
Currently, Heron only supports Java, so all HIs are JVM processes, but this will change in the future.
Heron Instance Configuration
HIs have a variety of configurable parameters that you can adjust at each phase of a topology’s lifecycle.
Heron Instance
Back Pressure Built In
Metrics Manager
Each topology runs a Metrics Manager (MM) that collects and exports metrics from all components in a
container. It then routes those metrics to both the Topology Master and to external collectors, such as
Scribe, Graphite, or analogous systems.
You can adapt Heron to support additional systems by implementing your own custom metrics sink.
Cluster-level Components
Heron CLI
Heron has a CLI tool called heron that is used to manage topologies. Documentation can be found in Managing Topologies.
Heron Tracker
The Heron Tracker (or just Tracker) is a centralized gateway for cluster-wide information about topologies, including which topologies are running,
being launched, being killed, etc. It relies on the same ZooKeeper nodes as the topologies in the cluster and exposes that information through a
JSON REST API. The Tracker can be run within your Heron cluster (on the same set of machines managed by your Heron scheduler) or outside
of it.
Instructions on running the tracker including JSON API docs can be found in Heron Tracker.
Heron UI
Heron UI is a rich visual interface that you can use to interact with topologies. Through Heron UI you can see color-coded visual representations of
the logical and physical plan of each topology in your cluster.
For more information, see the Heron UI document.
Other Kafka Use Cases
69
STACK EXAMPLE A
Use Case: Data Real-Time Analytics Ingestion
70
STACK EXAMPLE A+
Use Case: Data Real-Time Analytics Ingestion + Long Term Storage for Batch
71
STACK EXAMPLE B
Use Case: Real-Time Data Streaming/Processing
72
STACK EXAMPLE B+
Use Case: Real-Time Data Streaming/Processing + Feedback Loop
73
STACK EXAMPLE C
Use Case: Message Queuing
74
STACK EXAMPLE C+
Use Case: Message Queuing + Priority Management
75
STACK EXAMPLE D
Use Case: Distributed Akka Remoting for Real-Time Decisioning
76
STACK EXAMPLE D+
Use Case: Distributed Akka Remoting for Real-Time Decisioning + Long-
Term Batch
77
STACK EXAMPLE E
Use Case: Distributed Trace Services

Más contenido relacionado

La actualidad más candente

Kafka meetup JP #3 - Engineering Apache Kafka at LINE
Kafka meetup JP #3 - Engineering Apache Kafka at LINEKafka meetup JP #3 - Engineering Apache Kafka at LINE
Kafka meetup JP #3 - Engineering Apache Kafka at LINEkawamuray
 
Building a Real-Time Data Pipeline with Spark, Kafka, and Python
Building a Real-Time Data Pipeline with Spark, Kafka, and PythonBuilding a Real-Time Data Pipeline with Spark, Kafka, and Python
Building a Real-Time Data Pipeline with Spark, Kafka, and PythonSingleStore
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpNathan Handler
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesPhil Peace
 
Stream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaAbhinav Singh
 
Apache Kafka
Apache KafkaApache Kafka
Apache KafkaJoe Stein
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBasedave_revell
 
Fully fault tolerant real time data pipeline with docker and mesos
Fully fault tolerant real time data pipeline with docker and mesos Fully fault tolerant real time data pipeline with docker and mesos
Fully fault tolerant real time data pipeline with docker and mesos Rahul Kumar
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingYaroslav Tkachenko
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...DataStax Academy
 
Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19confluent
 
Native container monitoring
Native container monitoringNative container monitoring
Native container monitoringRohit Jnagal
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaJoe Stein
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedMichael Spector
 
A Journey through the JDKs (Java 9 to Java 11)
A Journey through the JDKs (Java 9 to Java 11)A Journey through the JDKs (Java 9 to Java 11)
A Journey through the JDKs (Java 9 to Java 11)Markus Günther
 
Spark streaming and Kafka
Spark streaming and KafkaSpark streaming and Kafka
Spark streaming and KafkaIraj Hedayati
 
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...Data Con LA
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingGwen (Chen) Shapira
 

La actualidad más candente (20)

Kafka meetup JP #3 - Engineering Apache Kafka at LINE
Kafka meetup JP #3 - Engineering Apache Kafka at LINEKafka meetup JP #3 - Engineering Apache Kafka at LINE
Kafka meetup JP #3 - Engineering Apache Kafka at LINE
 
kafka
kafkakafka
kafka
 
Building a Real-Time Data Pipeline with Spark, Kafka, and Python
Building a Real-Time Data Pipeline with Spark, Kafka, and PythonBuilding a Real-Time Data Pipeline with Spark, Kafka, and Python
Building a Real-Time Data Pipeline with Spark, Kafka, and Python
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at Yelp
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Stream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache Kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBase
 
Fully fault tolerant real time data pipeline with docker and mesos
Fully fault tolerant real time data pipeline with docker and mesos Fully fault tolerant real time data pipeline with docker and mesos
Fully fault tolerant real time data pipeline with docker and mesos
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processing
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
 
Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19
 
Native container monitoring
Native container monitoringNative container monitoring
Native container monitoring
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics Revised
 
A Journey through the JDKs (Java 9 to Java 11)
A Journey through the JDKs (Java 9 to Java 11)A Journey through the JDKs (Java 9 to Java 11)
A Journey through the JDKs (Java 9 to Java 11)
 
Spark streaming and Kafka
Spark streaming and KafkaSpark streaming and Kafka
Spark streaming and Kafka
 
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
 
spark-kafka_mod
spark-kafka_modspark-kafka_mod
spark-kafka_mod
 

Destacado

Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraJoe Stein
 
Data Storage Formats in Hadoop
Data Storage Formats in HadoopData Storage Formats in Hadoop
Data Storage Formats in HadoopBotond Balázs
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Hadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With PythonHadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With PythonJoe Stein
 
Scaling MQTT With Apache Kafka
Scaling MQTT With Apache KafkaScaling MQTT With Apache Kafka
Scaling MQTT With Apache Kafkakellogh
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1Joe Stein
 
Apache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACKApache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACKMaxim Shelest
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaJoe Stein
 
Koshy june27 140pm_room210_c_v4
Koshy june27 140pm_room210_c_v4Koshy june27 140pm_room210_c_v4
Koshy june27 140pm_room210_c_v4DataWorks Summit
 
Petascale Genomics (Strata Singapore 20151203)
Petascale Genomics (Strata Singapore 20151203)Petascale Genomics (Strata Singapore 20151203)
Petascale Genomics (Strata Singapore 20151203)Uri Laserson
 
Floods of Twitter Data - StampedeCon 2016
Floods of Twitter Data - StampedeCon 2016Floods of Twitter Data - StampedeCon 2016
Floods of Twitter Data - StampedeCon 2016StampedeCon
 
Log ingestion kafka -- impala using apex
Log ingestion   kafka -- impala using apexLog ingestion   kafka -- impala using apex
Log ingestion kafka -- impala using apexApache Apex
 
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016StampedeCon
 
jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011Joe Stein
 
Storing Time Series Metrics With Cassandra and Composite Columns
Storing Time Series Metrics With Cassandra and Composite ColumnsStoring Time Series Metrics With Cassandra and Composite Columns
Storing Time Series Metrics With Cassandra and Composite ColumnsJoe Stein
 
Toronto housing market_charts-february_2013
Toronto housing market_charts-february_2013Toronto housing market_charts-february_2013
Toronto housing market_charts-february_2013Amit Saini
 
Shifting Paradigms: Examining Pro-Thrombotic Activity from a Safety Perspective
Shifting Paradigms: Examining Pro-Thrombotic Activity from a Safety PerspectiveShifting Paradigms: Examining Pro-Thrombotic Activity from a Safety Perspective
Shifting Paradigms: Examining Pro-Thrombotic Activity from a Safety PerspectiveCorDynamics
 
Music video analysis part 1
Music video analysis part 1Music video analysis part 1
Music video analysis part 1Kirsty Evers
 

Destacado (20)

Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
 
Data Storage Formats in Hadoop
Data Storage Formats in HadoopData Storage Formats in Hadoop
Data Storage Formats in Hadoop
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Hadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With PythonHadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With Python
 
Scaling MQTT With Apache Kafka
Scaling MQTT With Apache KafkaScaling MQTT With Apache Kafka
Scaling MQTT With Apache Kafka
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
 
Apache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACKApache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACK
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Koshy june27 140pm_room210_c_v4
Koshy june27 140pm_room210_c_v4Koshy june27 140pm_room210_c_v4
Koshy june27 140pm_room210_c_v4
 
Petascale Genomics (Strata Singapore 20151203)
Petascale Genomics (Strata Singapore 20151203)Petascale Genomics (Strata Singapore 20151203)
Petascale Genomics (Strata Singapore 20151203)
 
Floods of Twitter Data - StampedeCon 2016
Floods of Twitter Data - StampedeCon 2016Floods of Twitter Data - StampedeCon 2016
Floods of Twitter Data - StampedeCon 2016
 
Log ingestion kafka -- impala using apex
Log ingestion   kafka -- impala using apexLog ingestion   kafka -- impala using apex
Log ingestion kafka -- impala using apex
 
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
 
jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011
 
Storing Time Series Metrics With Cassandra and Composite Columns
Storing Time Series Metrics With Cassandra and Composite ColumnsStoring Time Series Metrics With Cassandra and Composite Columns
Storing Time Series Metrics With Cassandra and Composite Columns
 
Result_2012-13_XII
Result_2012-13_XIIResult_2012-13_XII
Result_2012-13_XII
 
Toronto housing market_charts-february_2013
Toronto housing market_charts-february_2013Toronto housing market_charts-february_2013
Toronto housing market_charts-february_2013
 
Shifting Paradigms: Examining Pro-Thrombotic Activity from a Safety Perspective
Shifting Paradigms: Examining Pro-Thrombotic Activity from a Safety PerspectiveShifting Paradigms: Examining Pro-Thrombotic Activity from a Safety Perspective
Shifting Paradigms: Examining Pro-Thrombotic Activity from a Safety Perspective
 
Music video analysis part 1
Music video analysis part 1Music video analysis part 1
Music video analysis part 1
 

Similar a Streaming Processing with a Distributed Commit Log

Distributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and ScalaDistributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and ScalaMax Alexejev
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureTimothy Spann
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataGetInData
 
Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin  Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin Kuberton
 
Docker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platformsDocker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platformsFederico Michele Facca
 
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpStrimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpJosé Román Martín Gil
 
Tungsten Fabric Overview
Tungsten Fabric OverviewTungsten Fabric Overview
Tungsten Fabric OverviewMichelle Holley
 
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OSPutting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OSLightbend
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham
 
Manchester MuleSoft Meetup #6 - Runtime Fabric with Mulesoft
Manchester MuleSoft Meetup #6 - Runtime Fabric with Mulesoft Manchester MuleSoft Meetup #6 - Runtime Fabric with Mulesoft
Manchester MuleSoft Meetup #6 - Runtime Fabric with Mulesoft Akshata Sawant
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source Nitesh Jadhav
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and MetricsRicardo Lourenço
 
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...Timothy Spann
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...Athens Big Data
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...GetInData
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafkaemreakis
 
Stories from running Kafka on K8S.pdf
Stories from running Kafka on K8S.pdfStories from running Kafka on K8S.pdf
Stories from running Kafka on K8S.pdfAvinashUpadhyaya3
 
Dallas Mulesoft Meetup - Log Aggregation and Elastic Stack on Anypoint Platform
Dallas Mulesoft Meetup - Log Aggregation and Elastic Stack on Anypoint PlatformDallas Mulesoft Meetup - Log Aggregation and Elastic Stack on Anypoint Platform
Dallas Mulesoft Meetup - Log Aggregation and Elastic Stack on Anypoint PlatformAdam DesJardin
 

Similar a Streaming Processing with a Distributed Commit Log (20)

Distributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and ScalaDistributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and Scala
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
 
Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin  Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin
 
Docker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platformsDocker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platforms
 
Java one2013
Java one2013Java one2013
Java one2013
 
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpStrimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
 
Tungsten Fabric Overview
Tungsten Fabric OverviewTungsten Fabric Overview
Tungsten Fabric Overview
 
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OSPutting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
Putting Kafka In Jail – Best Practices To Run Kafka On Kubernetes & DC/OS
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Manchester MuleSoft Meetup #6 - Runtime Fabric with Mulesoft
Manchester MuleSoft Meetup #6 - Runtime Fabric with Mulesoft Manchester MuleSoft Meetup #6 - Runtime Fabric with Mulesoft
Manchester MuleSoft Meetup #6 - Runtime Fabric with Mulesoft
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and Metrics
 
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Stories from running Kafka on K8S.pdf
Stories from running Kafka on K8S.pdfStories from running Kafka on K8S.pdf
Stories from running Kafka on K8S.pdf
 
Dallas Mulesoft Meetup - Log Aggregation and Elastic Stack on Anypoint Platform
Dallas Mulesoft Meetup - Log Aggregation and Elastic Stack on Anypoint PlatformDallas Mulesoft Meetup - Log Aggregation and Elastic Stack on Anypoint Platform
Dallas Mulesoft Meetup - Log Aggregation and Elastic Stack on Anypoint Platform
 

Más de Joe Stein

Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache MesosJoe Stein
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaJoe Stein
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Joe Stein
 
Containerized Data Persistence on Mesos
Containerized Data Persistence on MesosContainerized Data Persistence on Mesos
Containerized Data Persistence on MesosJoe Stein
 
Making Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache MesosMaking Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache MesosJoe Stein
 
Building and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosBuilding and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosJoe Stein
 
Apache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosApache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosJoe Stein
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaJoe Stein
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache KafkaJoe Stein
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache KafkaJoe Stein
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache MesosJoe Stein
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaJoe Stein
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0Joe Stein
 

Más de Joe Stein (13)

Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
 
Containerized Data Persistence on Mesos
Containerized Data Persistence on MesosContainerized Data Persistence on Mesos
Containerized Data Persistence on Mesos
 
Making Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache MesosMaking Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache Mesos
 
Building and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosBuilding and Deploying Application to Apache Mesos
Building and Deploying Application to Apache Mesos
 
Apache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosApache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on Mesos
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache Kafka
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache Mesos
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 

Último

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 

Último (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 

Streaming Processing with a Distributed Commit Log

  • 1. Streaming Processing with a Distributed Commit Log Apache Kafka
  • 2. 2 Apache Kafka committer and PMC member. A frequent speaker on both Hadoop and Cassandra, Joe is the Co- Founder and CTO of Elodina Inc. Joe has been a distributed systems developer and architect for over {years} now having built backend systems that supported over one hundred million unique devices a day processing trillions of events. He blogs and hosts a podcast about Hadoop and related systems at All Things Hadoop. @allthingshadoop $(whoami)
  • 3. 3 ● Introduction to Apache Kafka ● Brokers “as a Service” ● Producers & Consumers “as a Service” ● More Use Cases for Kafka Overview
  • 5. 5 Apache Kafka was first open sourced by LinkedIn in 2011 Papers ● Building a Replicated Logging System with Apache Kafka http://www.vldb.org/pvldb/vol8/p1654-wang.pdf ● Kafka: A Distributed Messaging System for Log Processing http://research.microsoft.com/en- us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf ● Building LinkedIn’s Real-time Activity Data Pipeline http://sites.computer.org/debull/A12june/pipeline.pdf ● The Log: What Every Software Engineer Should Know About Real-time Data's Unifying Abstraction http://engineering. linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying http://kafka.apache.org/ Apache Kafka
  • 6. It often starts with just one data pipeline
  • 10. Point to Point Data Pipelines are Problematic
  • 11. Reuse of data pipelines for new providers
  • 12. Reuse of existing providers for new consumers
  • 13. Eventually the solution becomes the problem
  • 19. Read and Write Keys & Values to each partition
  • 22. Brokers Kafka Wire Protocol - http://kafka.apache.org/protocol.html ● Preliminaries ○ Network ○ Partitioning and bootstrapping ○ Partitioning Strategies ○ Batching ○ Versioning and Compatibility ● The Protocol ○ Protocol Primitive Types ○ Notes on reading the request format grammars ○ Common Request and Response Structure ○ Message Sets ● Constants ○ Error Codes ○ Api Keys ● The Messages ● Some Common Philosophical Questions
  • 24. Client Libraries Community Clients https://cwiki.apache.org/confluence/display/KAFKA/Clients ● Go (aka golang) Pure Go implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported. ● Python - Pure Python implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported. ● C - High performance C library with full protocol support ● Ruby - Pure Ruby, Consumer and Producer implementations included, GZIP and Snappy compression supported. Ruby 1.9.3 and up (CI runs MRI 2. ● Clojure - Clojure DSL for the Kafka API ● JavaScript (NodeJS) - NodeJS client in a pure JavaScript implementation
  • 25. Operationalizing Kafka https://kafka.apache.org/documentation.html#basic_ops Basic Kafka Operations ● Adding and removing topics ● Modifying topics ● Graceful shutdown ● Balancing leadership ● Checking consumer position ● Mirroring data between clusters ● Expanding your cluster ● Decommissioning brokers ● Increasing replication factor
  • 26. Kafka “as a Service”
  • 27. 27 CURRENT STATE OF IMPLEMENTATION 11 STEPS BEFORE ANY BUSINESS VALUE IS CREATED 1 SET UP Instances → AWS / GCE / etc.. 2 Repeat above by # of instances 3 SET UP uniformly, harden, secure every machine 4 DOWNLOAD: Apache Kafka 5 LEARN to install, run on multiple nodes / high availability 6 LEARN to run on multiple data centers / multiple racks 7 CONFIGURE nodes, tables specifically by cluster 8 MONITOR performance, isolate bottlenecks 9 OPTIMIZE system / team to hands off through next objective 10 MONITOR for failure and build disaster recovery protocol 11 FAILURE RECOVERY investigation, recovery and spin back up time 10 11 9 8 7 6 5 4 3 2 1 10 11 9 8 7 6 5 4 3 2 1 10 11 9 8 7 6 5 4 3 2 1 10 11 9 8 7 6 5 4 3 2 1 10 11 9 8 7 6 5 4 3 2 1 10 11 9 8 7 6 5 4 3 2 1 AND process must repeat by # of instances and technologies
  • 28. 28 ELODINA AUTOMATES DEPLOYMENT, SCALING AND MAINTENANCE Reduce steps and learning curve to a THREE stage repeatable process 1 SET UP Instances → AWS / GCE / etc.. 2 Repeat above by # of instances 3 SET UP uniformly, harden, secure every machine 4 DOWNLOAD: Apache Kafka 5 LEARN to install, run on multiple nodes / high availability 6 LEARN to run on multiple data centers / multiple racks 7 CONFIGURE nodes, tables specifically by cluster 8 MONITOR performance, isolate bottlenecks 9 OPTIMIZE system / team to hands off through next objective 10 MONITOR for failure and build disaster recovery protocol 11 FAILURE RECOVERY investigation, recovery and spin back up time Platform modulars allow for deployment in minutes DEPLOY Grid scales automatically with low latency based on real time traffic patterns. SCALE Single destination to observe and troubleshoot from CLI, REST API or GUI OBSERVE
  • 29. 29 BUILT- IN FRAMEWORKS DIRECTLY IN PLATFORM Leading technologies deployable across any compute resource Platform modulars allow for deployment in minutes DEPLOY Grid scales automatically with low latency based on real time traffic patterns. SCALE Single destination to observe and troubleshoot from CLI, REST API or GUI OBSERVE ResourcesTechnologies
  • 30. 30 IMMEDIATE OPERATIONAL BENEFITS Removing Fragmentation with Interoperability Clearing crowded market decisioning on which software or stack of software to choose and interoperate with your data center Immediate Efficiency & Reliability Operation resources deployed across multiple data centers across multiple regions streamlined with dynamic compute and automated scheduling capabilities. Automated Speed and Recovery Reduce costs and time to market on development cycle time and Automate recovery from failure and
  • 32.
  • 33.
  • 34.
  • 35.
  • 39. Scheduler ● Provides the operational automation for a Kafka Cluster. ● Manages the changes to the broker's configuration. ● Exposes a REST API for the CLI to use or any other client. ● Runs on Marathon for high availability. ● Broker Failure Management “stickiness” Executor ● The executor interacts with the kafka broker as an intermediary to the scheduler Scheduler & Executor
  • 40. Typical Operations ● Run the scheduler with Docker ● Run the scheduler on Marathon ● Changing the location where data is stored ● Starting 3 brokers ● View broker log ● High Availability Scheduler State ● Failed Broker Recovery ● Passing multiple options ● Broker metrics ● Rolling restart
  • 41. Navigating Operations ● Adding brokers to the cluster ● Updating broker configurations ● Starting brokers ● Stopping brokers ● Restarting brokers ● Removing brokers ● Retrieving broker log ● Rebalancing brokers in the cluster ● Listing topics ● Adding topic ● Updating topic
  • 42. Kafka as a Service
  • 43. -
  • 45.
  • 47.
  • 48.
  • 49. Topology Master The Topology Master (TM) manages a topology throughout its entire lifecycle, from the time it’s submitted until it’s ultimately killed. When heron deploys a topology it starts a single TM and multiple containers. The TM creates an ephemeral ZooKeeper node to ensure that there’s only one TM for the topology and that the TM is easily discoverable by any process in the topology. The TM also constructs the physical plan for a topology which it relays to different components. Container Each Heron topology consists of multiple containers, each of which houses multiple Heron Instances, a Stream Manager, and a Metrics Manager. Containers communicate with the topology’s TM to ensure that the topology forms a fully connected graph. For an illustration, see the figure in the Topology Master section above.
  • 50. Stream Manager The Stream Manager (SM) manages the routing of tuples between topology components. Each Heron Instance in a topology connects to its local SM, while all of the SMs in a given topology connect to one another to form a network. Below is a visual illustration of a network of SMs:
  • 51. Heron Instance A Heron Instance (HI) is a process that handles a single task of a spout or bolt, which allows for easy debugging and profiling. Currently, Heron only supports Java, so all HIs are JVM processes, but this will change in the future. Heron Instance Configuration HIs have a variety of configurable parameters that you can adjust at each phase of a topology’s lifecycle.
  • 54. Metrics Manager Each topology runs a Metrics Manager (MM) that collects and exports metrics from all components in a container. It then routes those metrics to both the Topology Master and to external collectors, such as Scribe, Graphite, or analogous systems. You can adapt Heron to support additional systems by implementing your own custom metrics sink.
  • 55. Cluster-level Components Heron CLI Heron has a CLI tool called heron that is used to manage topologies. Documentation can be found in Managing Topologies. Heron Tracker The Heron Tracker (or just Tracker) is a centralized gateway for cluster-wide information about topologies, including which topologies are running, being launched, being killed, etc. It relies on the same ZooKeeper nodes as the topologies in the cluster and exposes that information through a JSON REST API. The Tracker can be run within your Heron cluster (on the same set of machines managed by your Heron scheduler) or outside of it. Instructions on running the tracker including JSON API docs can be found in Heron Tracker. Heron UI Heron UI is a rich visual interface that you can use to interact with topologies. Through Heron UI you can see color-coded visual representations of the logical and physical plan of each topology in your cluster. For more information, see the Heron UI document.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 66.
  • 67.
  • 69. 69 STACK EXAMPLE A Use Case: Data Real-Time Analytics Ingestion
  • 70. 70 STACK EXAMPLE A+ Use Case: Data Real-Time Analytics Ingestion + Long Term Storage for Batch
  • 71. 71 STACK EXAMPLE B Use Case: Real-Time Data Streaming/Processing
  • 72. 72 STACK EXAMPLE B+ Use Case: Real-Time Data Streaming/Processing + Feedback Loop
  • 73. 73 STACK EXAMPLE C Use Case: Message Queuing
  • 74. 74 STACK EXAMPLE C+ Use Case: Message Queuing + Priority Management
  • 75. 75 STACK EXAMPLE D Use Case: Distributed Akka Remoting for Real-Time Decisioning
  • 76. 76 STACK EXAMPLE D+ Use Case: Distributed Akka Remoting for Real-Time Decisioning + Long- Term Batch
  • 77. 77 STACK EXAMPLE E Use Case: Distributed Trace Services