SlideShare una empresa de Scribd logo
1 de 42
Tame your router data
with Apache Kafka and Apache Druid
Rachel Pedreschi
rachel.pedreschi@imply.io
Eric Graham
eric.graham@imply.io
Tell ‘em what you are gonna tell ‘em
! The Who? Intro to your (slightly) nervous speakers
! The Why? What is the problem?
! The How? Introducing the OSS stack to solve all the world’s ills
! The Demo. So much demo.
2
The Who
3
Eric Graham. 

The Man, The Legend.
The one that wrote the paper
that got us accepted to this conference.
Rachel Pedreschi. 

Mostly Overhead.
The one that wrote the abstract
that got us accepted to this conference.
4
Part of the problem - The Data
5
Streaming Telemetry Flow Syslog Augmentation
A recent advancement to replace
SNMP. Provides streaming interface
vs. older pull model. Gives network
operators much quicker response to
deviations.
detailed network analysis around
TCP/IP flows through routers,
switches and firewalls. Flow data
includes src/dst MAC, src/dst IP,
Protocol, src/dst port, in/out
interface ID, TCP flags, TOS, BGP
information, Bytes/Packets and
more
System logs for routers and
switches
Routing, DNS, usernames make
visibility that much clearer
Telegraf, pipeline, sflowd
Tools - examples: PMACCT, Cento,
NIFI/NFDump
Syslog-ng
ksql, kstream, lookup tables, BGP
routing
Used to collect metrics on interface
stats, cpu, memory, disk space and
more.
Get detailed information on TCP/IP
packets
Textual information on whats going
on
Clearer visibility to make rapid decisions
Let’s make the data part of the solution!
6
OSS to the rescue!
7
Network analytics pipeline
Streaming architectures are true-to-life and enable faster decision cycles.
8Confidential. Do not redistribute.
Routers, Switches, Firewalls,
Hosts
Ingest
Application
Hostname mapping
Microservice name
Application name
Routing lookups
Enhance the data
Syslog
BGP, Flow
The Answer: Apache Kafka and Apache Druid
! Both built for modern data
architectures.
! Both can handle data at scale.
(largest Druid cluster over
2000 servers, 50Pb raw data)
! Full redundancy.
! Druid was developed for real-
time analytics.
! Both work in harmony together
helping get answers fast.
9
!10
What the heck is Apache Druid and Why
Should I Care?
11
!12
!13
!14
!15
The 90s: data warehouses and data marts
Tightly coupled architecture with limited flexibility.
Data
Data
Data
Data Sources
ETL Data
Warehouse
Processing Store and Compute
Analytics
Reporting
Data mining
Querying
Confidential. Do not redistribute. 16
!17
!18
The 2000s - present: data lakes
Separation of storage and compute enables flexibility in tools.
19
Data
Data
Data
Mapreduce
Reporting and Analytics
ELT
Data
Warehouse
ML/AI Engine
Search
system
Data
Lake
StorageData Sources
Confidential. Do not redistribute.
!20
The Now: data rivers
Streaming architectures enable faster decision cycles.
21
Data
Data
Data
Data Sources
Message bus
Data
Lake
Streaming OLAP
Confidential. Do not redistribute.
The problem
22
The problem
23
Typical Big Data++ Challenges
! Scale: when data is large, we need a lot of servers
! Speed: aiming for sub-second response time
! Complexity: too much fine grain to precompute
! High dimensionality: 10s or 100s of dimensions
! Concurrency: many users and tenants
! Freshness: load from streams
24
What were the options?
25
Search
platform
OLAP
! Real-time ingestion
! Flexible schema
! Full text search
! Batch ingestion
! Efficient storage
! Fast analytic queries
Timeseries
database
! Optimized storage for
time-based datasets
! Time-based functions
26
! Batch ingestion
! Efficient storage
! Fast analytic queries
Confidential. Do not redistribute.
Search
platform
OLAP
! Real-time ingestion
! Flexible schema
! Full text search
Timeseries
database
! Optimized storage for
time-based datasets
! Time-based functions
high performance
analytics database for
event-driven data
27
These guys have played a Druid…
28
Source: http://druid.io/druid-powered.html and imply.io
+ many more!
Gratuitous Customer Quote
“The performance is great ... some of the tables that we have internally in
Druid have billions and billions of events in them, and we’re scanning
them in under a second.”
29
Source: https://www.infoworld.com/article/2949168/hadoop/yahoo-struts-its-hadoop-stuff.html
From Yahoo:
Shall we take a look?
30
Network analytics pipeline
Streaming architectures are true-to-life and enable faster decision cycles.
31Confidential. Do not redistribute.
Routers, Switches, Firewalls,
Hosts
Ingest
Application
Hostname mapping
Microservice name
Application name
Routing lookups
Enhance the data
Syslog
BGP, Flow
!32
curl -X POST -H 'Content-Type:
application/json' -d @supervisor-spec.json
http://localhost:8090/druid/indexer/v1/
supervisor
33
Use Cases
34
Use Case: Network troubleshooting
35
Use Case: Network troubleshooting
! Dashboards that include logs, flow and snmp (single pane of glass) for quick cross dataset
visualizations.
! Visualize spikes and dips and easily filter on specific data.
! Enhance the data to visualize names and not IPs/MAC addresses – but get the IPs when you
need them.
! Dashboards to show most interesting, common areas of interest.
! Alerting notifications for threshold breaches or deviation from normal.
! Is it the network or application? Enhanced datasets provide quick answers.
36
Use Case: DDOS and security
! Visualize spikes and dips and easily filter on specific data. (Geo, Attack vectors, known bad
actors)
! DDOS specific alerting (UDP badports, TCP Flags, Number of unique IPs, Overall increase)
! Hooks to multiple notification channels for always on notifications.
! Webhooks for integration with back office systems.
! Easily drill-down into
37
Use Case: BGP Analytics
! PMACCT can collect and add BGP information by peering with a BGP speaker.
! Use Kafka KSQL or Kstream to augment data with BGP information.
! Visualize the BGP AS_PATH (where you traffic is going across the Internet).
! Who are your top transit or peering partners.
! Top Source and Destination ASNs.
! Top BGP communities.
38
Download
Druid community site (current): http://druid.io/
Druid community site (new): https://druid.apache.org/
Imply distribution: https://imply.io/get-started
39
Contribute
40
https://github.com/apache/druid
Stay in touch
41
@druidio
Join the community!
http://druid.io/community
Come by our booth for a druid t-shirt and to learn more!
Follow the Druid project on Twitter!
Thank you!
!42
Hold for applause…

Más contenido relacionado

La actualidad más candente

ElastiCache Deep Dive: Design Patterns for In-Memory Data Stores (DAT302-R1) ...
ElastiCache Deep Dive: Design Patterns for In-Memory Data Stores (DAT302-R1) ...ElastiCache Deep Dive: Design Patterns for In-Memory Data Stores (DAT302-R1) ...
ElastiCache Deep Dive: Design Patterns for In-Memory Data Stores (DAT302-R1) ...Amazon Web Services
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Distributed Caching in Kubernetes with Hazelcast
Distributed Caching in Kubernetes with HazelcastDistributed Caching in Kubernetes with Hazelcast
Distributed Caching in Kubernetes with HazelcastMesut Celik
 
ネットワーク超入門
ネットワーク超入門ネットワーク超入門
ネットワーク超入門xyzplus_net
 
[Confluent] 실시간 하이브리드, 멀티 클라우드 데이터 아키텍처로 빠르게 혀...
[Confluent] 실시간 하이브리드, 멀티 클라우드 데이터 아키텍처로 빠르게 혀...[Confluent] 실시간 하이브리드, 멀티 클라우드 데이터 아키텍처로 빠르게 혀...
[Confluent] 실시간 하이브리드, 멀티 클라우드 데이터 아키텍처로 빠르게 혀...confluent
 
Building Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache KafkaBuilding Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache Kafkaconfluent
 
19 08-22 introduction to activeMQ
19 08-22 introduction to activeMQ19 08-22 introduction to activeMQ
19 08-22 introduction to activeMQWoo Young Choi
 
NATS - A new nervous system for distributed cloud platforms
NATS - A new nervous system for distributed cloud platformsNATS - A new nervous system for distributed cloud platforms
NATS - A new nervous system for distributed cloud platformsDerek Collison
 
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...Kai Wähner
 
Kafka Quotas Talk at LinkedIn
Kafka Quotas Talk at LinkedInKafka Quotas Talk at LinkedIn
Kafka Quotas Talk at LinkedInAditya Auradkar
 
Kafka pub sub demo
Kafka pub sub demoKafka pub sub demo
Kafka pub sub demoSrish Kumar
 
Applications of Single Cell Analysis
Applications of Single  Cell AnalysisApplications of Single  Cell Analysis
Applications of Single Cell AnalysisQIAGEN
 
Hands on with CoAP and Californium
Hands on with CoAP and CaliforniumHands on with CoAP and Californium
Hands on with CoAP and CaliforniumJulien Vermillard
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeSlim Baltagi
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Prof. Wim Van Criekinge
 
DockerCon EU 2015: Day 1 General Session
DockerCon EU 2015: Day 1 General SessionDockerCon EU 2015: Day 1 General Session
DockerCon EU 2015: Day 1 General SessionDocker, Inc.
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...HostedbyConfluent
 

La actualidad más candente (20)

ElastiCache Deep Dive: Design Patterns for In-Memory Data Stores (DAT302-R1) ...
ElastiCache Deep Dive: Design Patterns for In-Memory Data Stores (DAT302-R1) ...ElastiCache Deep Dive: Design Patterns for In-Memory Data Stores (DAT302-R1) ...
ElastiCache Deep Dive: Design Patterns for In-Memory Data Stores (DAT302-R1) ...
 
Flow cytometry training garvan
Flow cytometry training garvanFlow cytometry training garvan
Flow cytometry training garvan
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Distributed Caching in Kubernetes with Hazelcast
Distributed Caching in Kubernetes with HazelcastDistributed Caching in Kubernetes with Hazelcast
Distributed Caching in Kubernetes with Hazelcast
 
ネットワーク超入門
ネットワーク超入門ネットワーク超入門
ネットワーク超入門
 
[Confluent] 실시간 하이브리드, 멀티 클라우드 데이터 아키텍처로 빠르게 혀...
[Confluent] 실시간 하이브리드, 멀티 클라우드 데이터 아키텍처로 빠르게 혀...[Confluent] 실시간 하이브리드, 멀티 클라우드 데이터 아키텍처로 빠르게 혀...
[Confluent] 실시간 하이브리드, 멀티 클라우드 데이터 아키텍처로 빠르게 혀...
 
Building Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache KafkaBuilding Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache Kafka
 
Rabbitmq basics
Rabbitmq basicsRabbitmq basics
Rabbitmq basics
 
19 08-22 introduction to activeMQ
19 08-22 introduction to activeMQ19 08-22 introduction to activeMQ
19 08-22 introduction to activeMQ
 
Tomcat
TomcatTomcat
Tomcat
 
NATS - A new nervous system for distributed cloud platforms
NATS - A new nervous system for distributed cloud platformsNATS - A new nervous system for distributed cloud platforms
NATS - A new nervous system for distributed cloud platforms
 
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
 
Kafka Quotas Talk at LinkedIn
Kafka Quotas Talk at LinkedInKafka Quotas Talk at LinkedIn
Kafka Quotas Talk at LinkedIn
 
Kafka pub sub demo
Kafka pub sub demoKafka pub sub demo
Kafka pub sub demo
 
Applications of Single Cell Analysis
Applications of Single  Cell AnalysisApplications of Single  Cell Analysis
Applications of Single Cell Analysis
 
Hands on with CoAP and Californium
Hands on with CoAP and CaliforniumHands on with CoAP and Californium
Hands on with CoAP and Californium
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
DockerCon EU 2015: Day 1 General Session
DockerCon EU 2015: Day 1 General SessionDockerCon EU 2015: Day 1 General Session
DockerCon EU 2015: Day 1 General Session
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
 

Similar a How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eric Graham, Imply Data) Kafka Summit London 2019

Open Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCOpen Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCSheetal Dolas
 
Lecture12 ie321 dr_atifshahzad - networks
Lecture12 ie321 dr_atifshahzad - networksLecture12 ie321 dr_atifshahzad - networks
Lecture12 ie321 dr_atifshahzad - networksAtif Shahzad
 
Big Data to SMART Data : Process Scenario
Big Data to SMART Data : Process ScenarioBig Data to SMART Data : Process Scenario
Big Data to SMART Data : Process ScenarioCHAKER ALLAOUI
 
eProsima RPC over DDS - OMG June 2013 Berlin Meeting
eProsima RPC over DDS - OMG June 2013 Berlin MeetingeProsima RPC over DDS - OMG June 2013 Berlin Meeting
eProsima RPC over DDS - OMG June 2013 Berlin MeetingJaime Martin Losa
 
Splunk app for stream
Splunk app for stream Splunk app for stream
Splunk app for stream csching
 
Realtime Detection of DDOS attacks using Apache Spark and MLLib
Realtime Detection of DDOS attacks using Apache Spark and MLLibRealtime Detection of DDOS attacks using Apache Spark and MLLib
Realtime Detection of DDOS attacks using Apache Spark and MLLibRyan Bosshart
 
Kentik Network@Scale (Dan Ellis)
Kentik Network@Scale (Dan Ellis)Kentik Network@Scale (Dan Ellis)
Kentik Network@Scale (Dan Ellis)gvillain
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 
Large-Scale System Integration with DDS for SCADA, C2, and Finance
Large-Scale System Integration with DDS for SCADA, C2, and FinanceLarge-Scale System Integration with DDS for SCADA, C2, and Finance
Large-Scale System Integration with DDS for SCADA, C2, and FinanceRick Warren
 
Splunk Stream - Einblicke in Netzwerk Traffic
Splunk Stream - Einblicke in Netzwerk TrafficSplunk Stream - Einblicke in Netzwerk Traffic
Splunk Stream - Einblicke in Netzwerk TrafficSplunk
 
Introduction to networking
Introduction to networkingIntroduction to networking
Introduction to networkingMohsen Sarakbi
 
Insider Threat Visualization - HITB 2007, Kuala Lumpur
Insider Threat Visualization - HITB 2007, Kuala LumpurInsider Threat Visualization - HITB 2007, Kuala Lumpur
Insider Threat Visualization - HITB 2007, Kuala LumpurRaffael Marty
 
Malware vs Big Data
Malware vs Big DataMalware vs Big Data
Malware vs Big DataFrank Denis
 
Solving the Really Big Tech Problems with IoT
 Solving the Really Big Tech Problems with IoT Solving the Really Big Tech Problems with IoT
Solving the Really Big Tech Problems with IoTEric Kavanagh
 

Similar a How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eric Graham, Imply Data) Kafka Summit London 2019 (20)

Open Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCOpen Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOC
 
Cisco OpenSOC
Cisco OpenSOCCisco OpenSOC
Cisco OpenSOC
 
Lecture12 ie321 dr_atifshahzad - networks
Lecture12 ie321 dr_atifshahzad - networksLecture12 ie321 dr_atifshahzad - networks
Lecture12 ie321 dr_atifshahzad - networks
 
Big Data to SMART Data : Process Scenario
Big Data to SMART Data : Process ScenarioBig Data to SMART Data : Process Scenario
Big Data to SMART Data : Process Scenario
 
eProsima RPC over DDS - OMG June 2013 Berlin Meeting
eProsima RPC over DDS - OMG June 2013 Berlin MeetingeProsima RPC over DDS - OMG June 2013 Berlin Meeting
eProsima RPC over DDS - OMG June 2013 Berlin Meeting
 
Tech
TechTech
Tech
 
Splunk app for stream
Splunk app for stream Splunk app for stream
Splunk app for stream
 
Realtime Detection of DDOS attacks using Apache Spark and MLLib
Realtime Detection of DDOS attacks using Apache Spark and MLLibRealtime Detection of DDOS attacks using Apache Spark and MLLib
Realtime Detection of DDOS attacks using Apache Spark and MLLib
 
Kentik Network@Scale (Dan Ellis)
Kentik Network@Scale (Dan Ellis)Kentik Network@Scale (Dan Ellis)
Kentik Network@Scale (Dan Ellis)
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Large-Scale System Integration with DDS for SCADA, C2, and Finance
Large-Scale System Integration with DDS for SCADA, C2, and FinanceLarge-Scale System Integration with DDS for SCADA, C2, and Finance
Large-Scale System Integration with DDS for SCADA, C2, and Finance
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
 
Splunk Stream - Einblicke in Netzwerk Traffic
Splunk Stream - Einblicke in Netzwerk TrafficSplunk Stream - Einblicke in Netzwerk Traffic
Splunk Stream - Einblicke in Netzwerk Traffic
 
Introduction to networking
Introduction to networkingIntroduction to networking
Introduction to networking
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
 
Insider Threat Visualization - HITB 2007, Kuala Lumpur
Insider Threat Visualization - HITB 2007, Kuala LumpurInsider Threat Visualization - HITB 2007, Kuala Lumpur
Insider Threat Visualization - HITB 2007, Kuala Lumpur
 
Networking 101 english
Networking 101   englishNetworking 101   english
Networking 101 english
 
Malware vs Big Data
Malware vs Big DataMalware vs Big Data
Malware vs Big Data
 
Solving the Really Big Tech Problems with IoT
 Solving the Really Big Tech Problems with IoT Solving the Really Big Tech Problems with IoT
Solving the Really Big Tech Problems with IoT
 

Más de confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

Más de confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Último

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 

Último (20)

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eric Graham, Imply Data) Kafka Summit London 2019

  • 1. Tame your router data with Apache Kafka and Apache Druid Rachel Pedreschi rachel.pedreschi@imply.io Eric Graham eric.graham@imply.io
  • 2. Tell ‘em what you are gonna tell ‘em ! The Who? Intro to your (slightly) nervous speakers ! The Why? What is the problem? ! The How? Introducing the OSS stack to solve all the world’s ills ! The Demo. So much demo. 2
  • 3. The Who 3 Eric Graham. 
 The Man, The Legend. The one that wrote the paper that got us accepted to this conference. Rachel Pedreschi. 
 Mostly Overhead. The one that wrote the abstract that got us accepted to this conference.
  • 4. 4
  • 5. Part of the problem - The Data 5 Streaming Telemetry Flow Syslog Augmentation A recent advancement to replace SNMP. Provides streaming interface vs. older pull model. Gives network operators much quicker response to deviations. detailed network analysis around TCP/IP flows through routers, switches and firewalls. Flow data includes src/dst MAC, src/dst IP, Protocol, src/dst port, in/out interface ID, TCP flags, TOS, BGP information, Bytes/Packets and more System logs for routers and switches Routing, DNS, usernames make visibility that much clearer Telegraf, pipeline, sflowd Tools - examples: PMACCT, Cento, NIFI/NFDump Syslog-ng ksql, kstream, lookup tables, BGP routing Used to collect metrics on interface stats, cpu, memory, disk space and more. Get detailed information on TCP/IP packets Textual information on whats going on Clearer visibility to make rapid decisions
  • 6. Let’s make the data part of the solution! 6
  • 7. OSS to the rescue! 7
  • 8. Network analytics pipeline Streaming architectures are true-to-life and enable faster decision cycles. 8Confidential. Do not redistribute. Routers, Switches, Firewalls, Hosts Ingest Application Hostname mapping Microservice name Application name Routing lookups Enhance the data Syslog BGP, Flow
  • 9. The Answer: Apache Kafka and Apache Druid ! Both built for modern data architectures. ! Both can handle data at scale. (largest Druid cluster over 2000 servers, 50Pb raw data) ! Full redundancy. ! Druid was developed for real- time analytics. ! Both work in harmony together helping get answers fast. 9
  • 10. !10
  • 11. What the heck is Apache Druid and Why Should I Care? 11
  • 12. !12
  • 13. !13
  • 14. !14
  • 15. !15
  • 16. The 90s: data warehouses and data marts Tightly coupled architecture with limited flexibility. Data Data Data Data Sources ETL Data Warehouse Processing Store and Compute Analytics Reporting Data mining Querying Confidential. Do not redistribute. 16
  • 17. !17
  • 18. !18
  • 19. The 2000s - present: data lakes Separation of storage and compute enables flexibility in tools. 19 Data Data Data Mapreduce Reporting and Analytics ELT Data Warehouse ML/AI Engine Search system Data Lake StorageData Sources Confidential. Do not redistribute.
  • 20. !20
  • 21. The Now: data rivers Streaming architectures enable faster decision cycles. 21 Data Data Data Data Sources Message bus Data Lake Streaming OLAP Confidential. Do not redistribute.
  • 24. Typical Big Data++ Challenges ! Scale: when data is large, we need a lot of servers ! Speed: aiming for sub-second response time ! Complexity: too much fine grain to precompute ! High dimensionality: 10s or 100s of dimensions ! Concurrency: many users and tenants ! Freshness: load from streams 24
  • 25. What were the options? 25 Search platform OLAP ! Real-time ingestion ! Flexible schema ! Full text search ! Batch ingestion ! Efficient storage ! Fast analytic queries Timeseries database ! Optimized storage for time-based datasets ! Time-based functions
  • 26. 26 ! Batch ingestion ! Efficient storage ! Fast analytic queries Confidential. Do not redistribute. Search platform OLAP ! Real-time ingestion ! Flexible schema ! Full text search Timeseries database ! Optimized storage for time-based datasets ! Time-based functions high performance analytics database for event-driven data
  • 27. 27
  • 28. These guys have played a Druid… 28 Source: http://druid.io/druid-powered.html and imply.io + many more!
  • 29. Gratuitous Customer Quote “The performance is great ... some of the tables that we have internally in Druid have billions and billions of events in them, and we’re scanning them in under a second.” 29 Source: https://www.infoworld.com/article/2949168/hadoop/yahoo-struts-its-hadoop-stuff.html From Yahoo:
  • 30. Shall we take a look? 30
  • 31. Network analytics pipeline Streaming architectures are true-to-life and enable faster decision cycles. 31Confidential. Do not redistribute. Routers, Switches, Firewalls, Hosts Ingest Application Hostname mapping Microservice name Application name Routing lookups Enhance the data Syslog BGP, Flow
  • 32. !32 curl -X POST -H 'Content-Type: application/json' -d @supervisor-spec.json http://localhost:8090/druid/indexer/v1/ supervisor
  • 33. 33
  • 35. Use Case: Network troubleshooting 35
  • 36. Use Case: Network troubleshooting ! Dashboards that include logs, flow and snmp (single pane of glass) for quick cross dataset visualizations. ! Visualize spikes and dips and easily filter on specific data. ! Enhance the data to visualize names and not IPs/MAC addresses – but get the IPs when you need them. ! Dashboards to show most interesting, common areas of interest. ! Alerting notifications for threshold breaches or deviation from normal. ! Is it the network or application? Enhanced datasets provide quick answers. 36
  • 37. Use Case: DDOS and security ! Visualize spikes and dips and easily filter on specific data. (Geo, Attack vectors, known bad actors) ! DDOS specific alerting (UDP badports, TCP Flags, Number of unique IPs, Overall increase) ! Hooks to multiple notification channels for always on notifications. ! Webhooks for integration with back office systems. ! Easily drill-down into 37
  • 38. Use Case: BGP Analytics ! PMACCT can collect and add BGP information by peering with a BGP speaker. ! Use Kafka KSQL or Kstream to augment data with BGP information. ! Visualize the BGP AS_PATH (where you traffic is going across the Internet). ! Who are your top transit or peering partners. ! Top Source and Destination ASNs. ! Top BGP communities. 38
  • 39. Download Druid community site (current): http://druid.io/ Druid community site (new): https://druid.apache.org/ Imply distribution: https://imply.io/get-started 39
  • 41. Stay in touch 41 @druidio Join the community! http://druid.io/community Come by our booth for a druid t-shirt and to learn more! Follow the Druid project on Twitter!
  • 42. Thank you! !42 Hold for applause…