SlideShare una empresa de Scribd logo
1 de 47
Descargar para leer sin conexión
Network Traffic Visibility and
Anomaly Detection
@Scale: October 27th, 2016
Dan Ellis
• What is network traffic visibility?
• My relationship with traffic tools
• 20+ years of frustrations
• ISP’s, CDNs, enterprises
• Who is Kentik
Goal of this talk: Make your life easier
Introduction
• Data networks can be
compared to FedEx
• Imagine FedEx without
package tracking
• Majority of data
networks operate in
this vacuum of visibility
• Hard to believe?
Problem is massive
data scale, lack of
tools, little network +
systems collaboration
Traffic Visibility Problem
4
Not Helping…
•Interface Volume
(Mb/s, pps)?
A Poll
•Interface Volume
(Mb/s, pps)?
•Src/Dst IP+Port,
ASN, BGP Path?
A Poll
A Poll
•Interface Volume
(Mb/s, pps)?
•Src/Dst IP+Port,
ASN, BGP Path?
•IP, Port, ASN
or Path Thresholds?
?Maybe there isn’t a traffic visibility problem
Maybe no one really needs this data
Is this really a problem?
9
Complaints of high latency… BGP Path to Comcast
10
Dyn attack last week – ISP recursive inbound
Dyn attack last week – Traffic / source_ip
Dyn attack last week – ISP recursive outbound
Use cases of traffic visibility
• Network Planning
• Peering Analytics and Abuse
• Congestion detection
• Is it the network?
• Where on the network?
• Proactive alerting
• Distributed DDoS Detection
• What Changed Post Deploy?
• Security and Breach
Detection
• Cost Analytics
• Revenue Identification
(New + Risk)
• Enabling Internal Groups
Tenets
• Infinite granularity storage for months
• Drillable visibility, network specific UI
• Real-time and fast (< 10s queries)
• Anomaly detection + actions
• Open / API
• Scale
Now we know what we need,
how do we do it?
TCP stats data / app specific data
Where to find this data ?
Flow data
NetFlow, SFlow, IPFIX
SNMP interfaces info
Sys/Event logs
TACACS
or
Syslog
App
Server
BGP Path info
NETWORK
+
+
+
=
Something actually
useful
+
Router
Router
PCAP
agent
What kind of tools
• Current Open Source:
• Older Open Source:
• Commercial software:
• DIY Big Data:
• On-Prem Big Data:
• SaaS Big Data:
pmacct, ntop, SiLK, cacti
cflowd, AS-PATH, RRDtool
Arbor, Plixer, SevOne, Solarwinds, ManageEngine
Kafka + ELK, Hadoop, druid, grafana, tsdb
Cisco Tetration, Deepfield…
Kentik, Datadog, Appneta, Splunk
18
Many tools gets you almost there
Really though… (tools)
Open source (ish):
• Pmacct
• Nprobe / Ntop
• Elastic Search + Kibana (ELK)
Commercial:
• Arbor
• Kentik
Three layer approach
Ingest &
Fusion layer
Storage Layer
(flow specific)
Query
Layer
Each layer has separate and different scaling
characteristics
Query engine
and UI
Query
interfaces
SQL
WWW
REST
data
sources
clients
SELECT flow
FROM router
WHERE …
>_
Seriously?
How much data
• Small network (< 10Gb/s traf.)
• Large network (1 Tb/s traf.)
• Querying over 30+ days
10k flows/sec (+rows/sec)
500k flows/sec
@ 200k fps (518 B rows, 207 TB) in < 10s
Data fusion
is a key enabler to useful data
Distributed system data ingest
Proxy
DATA FUSION
Decoder
Modules
Mem
Tables
NetFlow v5
NetFlow v9
IPFIX
BGP RIB
Custom Tags
SNMP Poller
BGP
Daemon
Enrichment
DB
DATA FUSION
Geo ß à IP
ASN ß à IP
SFlow
ROUTER
FLOW FRIENDLY DATASTORE
Single flow
fused row
sent to storage
Fusing should be near real-time, performed at
ingest and data specific
Proxy
Agent
CLIENT PROCESS
Decoder
Modules
Mem
Tables
NetFlow v5
NetFlow v9
IPFIX
Unified Flow
BGP RIB
Custom Tags
SNMP Poller
BGP
Daemon
Enrichment
DB
DATA FUSION
Geo ß à IP
ASN ß à IP
SFlow
ROUTER
FLOW FRIENDLY DATASTORE
Single flow
fused row
sent to storage
Distributed system data ingest
Serious. Flux Capacitor. Is. Serious.
Network planning: traffic by BGP hop
Network planning: collapsed path, exclude 1st
Looking at existing architectures
out there
PMAcct-based implementation
BGP
Flow
PCAP
Ingest &
Fusion layer
Storage Layer
Query
Layer
data
sources
clients
frontend
Nprobe + Ntop + ElasticSearch
BGP
Flow
nProbe
nProbe(s)
Ingest &
Fusion layer
Storage Layer
Query
Layer
data
sources
clients
BGP
Flow
• Dropbox implementation of a (mostly) open-source NetFlow solution
here: Dropbox blog
• Requires custom ingest, fusing, UI
Kafka + Elastic Search
• Ingest:
• Data-store:
• Query frontends very generic:
Distributing and scaling (1xNProbe = 1xDevice)
No SNMP (= no IF info available for fusion)
Aggregation (no infinite granularity)
Challenging at scale when ES
very hard for MySQL/MongoDB
Tailoring of meaningful dashboards difficult
No anomaly detection
Caveats
Commercial HW solutions (Arbor)
Appliance based
not truly distributed
pre-determined list of aggregated data (no infinite granularity)
And so…
Why isn’t everybody already doing it?
Required areas of expertise
(because every presentation needs a Vin diagram)
Distributed
systems
engineers
Network
Engineers
SREs
Low-level
Network
developers
Resilience / Reliability
Geo-distributed ingest
Flow friendly data-store
BGP Daemon
Flow inspection & conversion
Network protocols hacking
Make all of the above
work reliably
Train all the other teams
on the involved network
protocols and their usage
Unicorn
Looking beyond the basics
Once you have a platform, what’s next?
• Augmented flow (retransmits, latency, URL, DNS)
• Anomaly detection
• Multi-hop exit determination
• BGP-path congestion detection
Building on top of the platform
Imagine if we could get performance data from the network:
• Q Depth
• Retransmits per flow
• TCP latency
• Application Latency
You can:
• Nprobe (ntop) collects Latency, Rxmits, URL, DNS -> IPFIX flow
• Deploy on a host or a sensor
• Cisco, Juniper, Arista working to expose Q Depth into flow
• Elastic Search, Plixer, Kentik can consume additional IPFIX fields
Building on top of the platform
Retransmits enhanced flow: rexmits / interface
Retransmits enhanced flow: rexmits / ASN
Retransmits enhanced flow: TCP latency / ASN
You shouldn’t have to stare at dashboards or watch logs to detect
badness
Monitor top-x of any dimension combination (IP, ASN’s, Geo, Interface)
Create baselines based on time of day
Be able to look at things beyond pps/bps such as retransmits, latency, logs
Detect shifts: did an ASN or IP on a particular interface suddenly move from
top-x #200 to #2 and that is unusual for this time of day
This is available today (Open Source: Hadoop, Spark, Storm, Samza, Flink)
Building on top of the platform
Use case: anomaly detection
Traffic	from	one	ASN	(network)	unusually	high.	Operator	notified	at	red	line.
Use case: traffic anomaly detection & annotation
Use case: traffic annotated w/ multiple events
Anomaly detection: DDoS detection & characteristics
Once you have a platform, what’s next?
üAugmented flow (retransmits, latency, URL, DNS)
üAnomaly detection
q Multi-hop exit determination
Challenging to map traffic from ingest to exit point, multi-hop
q BGP-path congestion detection
Detect individual congested paths within a circuit that isn’t
congested
Building on top of the platform
Summary
Networks can produce large amounts of data that will make your life
easier
Big Data platforms are able to consume this data
Specific tools for Network Operators are beginning to appear (free &
paid)
Paid tools are more specific to network use (UI, easy setup, etc)
Free tools have the “power” but require cobbling together pieces
Much work to be done re fusing data such as logs, changes, alerts, DNS
SaaS providers will provide community views and enable data-sharing
QUESTIONS ?
Dan Ellis
dan@kentik.com
CREDITS

Más contenido relacionado

La actualidad más candente

Intro to Kapacitor for Alerting and Anomaly Detection
Intro to Kapacitor for Alerting and Anomaly DetectionIntro to Kapacitor for Alerting and Anomaly Detection
Intro to Kapacitor for Alerting and Anomaly DetectionInfluxData
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...Altinity Ltd
 
FIWARE Tech Summit - FIWARE Cygnus and STH-Comet
FIWARE Tech Summit - FIWARE Cygnus and STH-CometFIWARE Tech Summit - FIWARE Cygnus and STH-Comet
FIWARE Tech Summit - FIWARE Cygnus and STH-CometFIWARE
 
Orion NTA Customer Training
Orion NTA Customer TrainingOrion NTA Customer Training
Orion NTA Customer TrainingSolarWinds
 
Monitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSMonitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSAmazon Web Services
 
Spotify architecture - Pressing play
Spotify architecture - Pressing playSpotify architecture - Pressing play
Spotify architecture - Pressing playNiklas Gustavsson
 
Intrusion prevention system(ips)
Intrusion prevention system(ips)Intrusion prevention system(ips)
Intrusion prevention system(ips)Papun Papun
 
Rpc Case Studies (Distributed computing)
Rpc Case Studies (Distributed computing)Rpc Case Studies (Distributed computing)
Rpc Case Studies (Distributed computing)Sri Prasanna
 
Nmap basics
Nmap basicsNmap basics
Nmap basicsitmind4u
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used forAljoscha Krettek
 
Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Mayank Shrivastava
 
PACKET Sniffer IMPLEMENTATION
PACKET Sniffer IMPLEMENTATIONPACKET Sniffer IMPLEMENTATION
PACKET Sniffer IMPLEMENTATIONGoutham Royal
 
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby NodeHadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby NodeErik Krogen
 
Building an Observability platform with ClickHouse
Building an Observability platform with ClickHouseBuilding an Observability platform with ClickHouse
Building an Observability platform with ClickHouseAltinity Ltd
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)DataWorks Summit
 

La actualidad más candente (20)

Intro to Kapacitor for Alerting and Anomaly Detection
Intro to Kapacitor for Alerting and Anomaly DetectionIntro to Kapacitor for Alerting and Anomaly Detection
Intro to Kapacitor for Alerting and Anomaly Detection
 
Mininet Basics
Mininet BasicsMininet Basics
Mininet Basics
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
 
HDInsight for Architects
HDInsight for ArchitectsHDInsight for Architects
HDInsight for Architects
 
FIWARE Tech Summit - FIWARE Cygnus and STH-Comet
FIWARE Tech Summit - FIWARE Cygnus and STH-CometFIWARE Tech Summit - FIWARE Cygnus and STH-Comet
FIWARE Tech Summit - FIWARE Cygnus and STH-Comet
 
Orion NTA Customer Training
Orion NTA Customer TrainingOrion NTA Customer Training
Orion NTA Customer Training
 
Monitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSMonitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECS
 
Nifi workshop
Nifi workshopNifi workshop
Nifi workshop
 
Spotify architecture - Pressing play
Spotify architecture - Pressing playSpotify architecture - Pressing play
Spotify architecture - Pressing play
 
Intrusion prevention system(ips)
Intrusion prevention system(ips)Intrusion prevention system(ips)
Intrusion prevention system(ips)
 
Rpc Case Studies (Distributed computing)
Rpc Case Studies (Distributed computing)Rpc Case Studies (Distributed computing)
Rpc Case Studies (Distributed computing)
 
Nmap basics
Nmap basicsNmap basics
Nmap basics
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020
 
PACKET Sniffer IMPLEMENTATION
PACKET Sniffer IMPLEMENTATIONPACKET Sniffer IMPLEMENTATION
PACKET Sniffer IMPLEMENTATION
 
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby NodeHadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Intrusion Prevention System
Intrusion Prevention SystemIntrusion Prevention System
Intrusion Prevention System
 
Building an Observability platform with ClickHouse
Building an Observability platform with ClickHouseBuilding an Observability platform with ClickHouse
Building an Observability platform with ClickHouse
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
 

Destacado

Nokia Big Data and Analytics
Nokia Big Data and AnalyticsNokia Big Data and Analytics
Nokia Big Data and Analyticsjthaskell
 
Big Data Expo 2015 - Schiphol Big Data @ Schiphol
Big Data Expo 2015 - Schiphol Big Data @ SchipholBig Data Expo 2015 - Schiphol Big Data @ Schiphol
Big Data Expo 2015 - Schiphol Big Data @ SchipholBigDataExpo
 
pmacct, kafka, presto, re:dash を使った高速なflow解析
pmacct, kafka, presto, re:dash を使った高速なflow解析pmacct, kafka, presto, re:dash を使った高速なflow解析
pmacct, kafka, presto, re:dash を使った高速なflow解析 Kaname Nishizuka
 
CPN102 Your First Week with Amazon Elastic Compute Cloud - AWS re: Invent …
CPN102 Your First Week with Amazon Elastic Compute Cloud - AWS re: Invent …CPN102 Your First Week with Amazon Elastic Compute Cloud - AWS re: Invent …
CPN102 Your First Week with Amazon Elastic Compute Cloud - AWS re: Invent …Amazon Web Services
 
Cloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisCloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisAlex Henthorn-Iwane
 

Destacado (8)

Cloud Aware Network Management
Cloud Aware Network ManagementCloud Aware Network Management
Cloud Aware Network Management
 
Nokia Big Data and Analytics
Nokia Big Data and AnalyticsNokia Big Data and Analytics
Nokia Big Data and Analytics
 
Big Data Expo 2015 - Schiphol Big Data @ Schiphol
Big Data Expo 2015 - Schiphol Big Data @ SchipholBig Data Expo 2015 - Schiphol Big Data @ Schiphol
Big Data Expo 2015 - Schiphol Big Data @ Schiphol
 
deepfield_networks
deepfield_networksdeepfield_networks
deepfield_networks
 
Next-Gen DDoS Detection
Next-Gen DDoS DetectionNext-Gen DDoS Detection
Next-Gen DDoS Detection
 
pmacct, kafka, presto, re:dash を使った高速なflow解析
pmacct, kafka, presto, re:dash を使った高速なflow解析pmacct, kafka, presto, re:dash を使った高速なflow解析
pmacct, kafka, presto, re:dash を使った高速なflow解析
 
CPN102 Your First Week with Amazon Elastic Compute Cloud - AWS re: Invent …
CPN102 Your First Week with Amazon Elastic Compute Cloud - AWS re: Invent …CPN102 Your First Week with Amazon Elastic Compute Cloud - AWS re: Invent …
CPN102 Your First Week with Amazon Elastic Compute Cloud - AWS re: Invent …
 
Cloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisCloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow Analysis
 

Similar a Kentik Network@Scale (Dan Ellis)

Co se skrývá v datovém provozu? - Pavel Minařík
Co se skrývá v datovém provozu? - Pavel MinaříkCo se skrývá v datovém provozu? - Pavel Minařík
Co se skrývá v datovém provozu? - Pavel MinaříkSecurity Session
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax Academy
 
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...Altinity Ltd
 
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...confluent
 
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...confluent
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About Jesus Rodriguez
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksDataWorks Summit
 
Edward King SPEDDEXES 2014
Edward King SPEDDEXES 2014Edward King SPEDDEXES 2014
Edward King SPEDDEXES 2014aceas13tern
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksMapR Technologies
 
Network Situational Awareness with d00gle
Network Situational Awareness with d00gleNetwork Situational Awareness with d00gle
Network Situational Awareness with d00gleDug Song
 
The Background Noise of the Internet
The Background Noise of the InternetThe Background Noise of the Internet
The Background Noise of the InternetAndrew Morris
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...DataWorks Summit/Hadoop Summit
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
 
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemXDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemDan Eaton
 
Splunk Conf2010: Corporate Express presents Splunk with SAP
Splunk Conf2010: Corporate Express presents Splunk with SAPSplunk Conf2010: Corporate Express presents Splunk with SAP
Splunk Conf2010: Corporate Express presents Splunk with SAPSplunk
 
Wasp2 - IoT and Streaming Platform
Wasp2 - IoT and Streaming PlatformWasp2 - IoT and Streaming Platform
Wasp2 - IoT and Streaming PlatformPaolo Platter
 
Open Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCOpen Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCSheetal Dolas
 
HSB15 - Pavel Minarik - INVEATECH
HSB15 - Pavel Minarik - INVEATECHHSB15 - Pavel Minarik - INVEATECH
HSB15 - Pavel Minarik - INVEATECHSplend
 

Similar a Kentik Network@Scale (Dan Ellis) (20)

Co se skrývá v datovém provozu? - Pavel Minařík
Co se skrývá v datovém provozu? - Pavel MinaříkCo se skrývá v datovém provozu? - Pavel Minařík
Co se skrývá v datovém provozu? - Pavel Minařík
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and Analytics
 
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
 
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...
 
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Edward King SPEDDEXES 2014
Edward King SPEDDEXES 2014Edward King SPEDDEXES 2014
Edward King SPEDDEXES 2014
 
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Network Situational Awareness with d00gle
Network Situational Awareness with d00gleNetwork Situational Awareness with d00gle
Network Situational Awareness with d00gle
 
The Background Noise of the Internet
The Background Noise of the InternetThe Background Noise of the Internet
The Background Noise of the Internet
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
 
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemXDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
 
From Device to Data Center to Insights
From Device to Data Center to InsightsFrom Device to Data Center to Insights
From Device to Data Center to Insights
 
Splunk Conf2010: Corporate Express presents Splunk with SAP
Splunk Conf2010: Corporate Express presents Splunk with SAPSplunk Conf2010: Corporate Express presents Splunk with SAP
Splunk Conf2010: Corporate Express presents Splunk with SAP
 
Wasp2 - IoT and Streaming Platform
Wasp2 - IoT and Streaming PlatformWasp2 - IoT and Streaming Platform
Wasp2 - IoT and Streaming Platform
 
Open Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCOpen Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOC
 
HSB15 - Pavel Minarik - INVEATECH
HSB15 - Pavel Minarik - INVEATECHHSB15 - Pavel Minarik - INVEATECH
HSB15 - Pavel Minarik - INVEATECH
 

Último

Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingBootNeck1
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptNarmatha D
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptJasonTagapanGulla
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...Amil Baba Dawood bangali
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 

Último (20)

Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event Scheduling
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.ppt
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.ppt
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 

Kentik Network@Scale (Dan Ellis)

  • 1. Network Traffic Visibility and Anomaly Detection @Scale: October 27th, 2016 Dan Ellis
  • 2. • What is network traffic visibility? • My relationship with traffic tools • 20+ years of frustrations • ISP’s, CDNs, enterprises • Who is Kentik Goal of this talk: Make your life easier Introduction
  • 3. • Data networks can be compared to FedEx • Imagine FedEx without package tracking • Majority of data networks operate in this vacuum of visibility • Hard to believe? Problem is massive data scale, lack of tools, little network + systems collaboration Traffic Visibility Problem
  • 6. •Interface Volume (Mb/s, pps)? •Src/Dst IP+Port, ASN, BGP Path? A Poll
  • 7. A Poll •Interface Volume (Mb/s, pps)? •Src/Dst IP+Port, ASN, BGP Path? •IP, Port, ASN or Path Thresholds?
  • 8. ?Maybe there isn’t a traffic visibility problem Maybe no one really needs this data Is this really a problem?
  • 9. 9 Complaints of high latency… BGP Path to Comcast
  • 10. 10 Dyn attack last week – ISP recursive inbound
  • 11. Dyn attack last week – Traffic / source_ip
  • 12. Dyn attack last week – ISP recursive outbound
  • 13. Use cases of traffic visibility • Network Planning • Peering Analytics and Abuse • Congestion detection • Is it the network? • Where on the network? • Proactive alerting • Distributed DDoS Detection • What Changed Post Deploy? • Security and Breach Detection • Cost Analytics • Revenue Identification (New + Risk) • Enabling Internal Groups
  • 14. Tenets • Infinite granularity storage for months • Drillable visibility, network specific UI • Real-time and fast (< 10s queries) • Anomaly detection + actions • Open / API • Scale
  • 15. Now we know what we need, how do we do it?
  • 16. TCP stats data / app specific data Where to find this data ? Flow data NetFlow, SFlow, IPFIX SNMP interfaces info Sys/Event logs TACACS or Syslog App Server BGP Path info NETWORK + + + = Something actually useful + Router Router PCAP agent
  • 17. What kind of tools • Current Open Source: • Older Open Source: • Commercial software: • DIY Big Data: • On-Prem Big Data: • SaaS Big Data: pmacct, ntop, SiLK, cacti cflowd, AS-PATH, RRDtool Arbor, Plixer, SevOne, Solarwinds, ManageEngine Kafka + ELK, Hadoop, druid, grafana, tsdb Cisco Tetration, Deepfield… Kentik, Datadog, Appneta, Splunk
  • 18. 18 Many tools gets you almost there
  • 19. Really though… (tools) Open source (ish): • Pmacct • Nprobe / Ntop • Elastic Search + Kibana (ELK) Commercial: • Arbor • Kentik
  • 20. Three layer approach Ingest & Fusion layer Storage Layer (flow specific) Query Layer Each layer has separate and different scaling characteristics Query engine and UI Query interfaces SQL WWW REST data sources clients SELECT flow FROM router WHERE … >_
  • 21. Seriously? How much data • Small network (< 10Gb/s traf.) • Large network (1 Tb/s traf.) • Querying over 30+ days 10k flows/sec (+rows/sec) 500k flows/sec @ 200k fps (518 B rows, 207 TB) in < 10s
  • 22. Data fusion is a key enabler to useful data
  • 23. Distributed system data ingest Proxy DATA FUSION Decoder Modules Mem Tables NetFlow v5 NetFlow v9 IPFIX BGP RIB Custom Tags SNMP Poller BGP Daemon Enrichment DB DATA FUSION Geo ß à IP ASN ß à IP SFlow ROUTER FLOW FRIENDLY DATASTORE Single flow fused row sent to storage Fusing should be near real-time, performed at ingest and data specific
  • 24. Proxy Agent CLIENT PROCESS Decoder Modules Mem Tables NetFlow v5 NetFlow v9 IPFIX Unified Flow BGP RIB Custom Tags SNMP Poller BGP Daemon Enrichment DB DATA FUSION Geo ß à IP ASN ß à IP SFlow ROUTER FLOW FRIENDLY DATASTORE Single flow fused row sent to storage Distributed system data ingest Serious. Flux Capacitor. Is. Serious.
  • 26. Network planning: collapsed path, exclude 1st
  • 27. Looking at existing architectures out there
  • 28. PMAcct-based implementation BGP Flow PCAP Ingest & Fusion layer Storage Layer Query Layer data sources clients frontend
  • 29. Nprobe + Ntop + ElasticSearch BGP Flow nProbe nProbe(s) Ingest & Fusion layer Storage Layer Query Layer data sources clients BGP Flow
  • 30. • Dropbox implementation of a (mostly) open-source NetFlow solution here: Dropbox blog • Requires custom ingest, fusing, UI Kafka + Elastic Search
  • 31. • Ingest: • Data-store: • Query frontends very generic: Distributing and scaling (1xNProbe = 1xDevice) No SNMP (= no IF info available for fusion) Aggregation (no infinite granularity) Challenging at scale when ES very hard for MySQL/MongoDB Tailoring of meaningful dashboards difficult No anomaly detection Caveats Commercial HW solutions (Arbor) Appliance based not truly distributed pre-determined list of aggregated data (no infinite granularity)
  • 33. Why isn’t everybody already doing it? Required areas of expertise (because every presentation needs a Vin diagram) Distributed systems engineers Network Engineers SREs Low-level Network developers Resilience / Reliability Geo-distributed ingest Flow friendly data-store BGP Daemon Flow inspection & conversion Network protocols hacking Make all of the above work reliably Train all the other teams on the involved network protocols and their usage Unicorn
  • 35. Once you have a platform, what’s next? • Augmented flow (retransmits, latency, URL, DNS) • Anomaly detection • Multi-hop exit determination • BGP-path congestion detection Building on top of the platform
  • 36. Imagine if we could get performance data from the network: • Q Depth • Retransmits per flow • TCP latency • Application Latency You can: • Nprobe (ntop) collects Latency, Rxmits, URL, DNS -> IPFIX flow • Deploy on a host or a sensor • Cisco, Juniper, Arista working to expose Q Depth into flow • Elastic Search, Plixer, Kentik can consume additional IPFIX fields Building on top of the platform
  • 37. Retransmits enhanced flow: rexmits / interface
  • 38. Retransmits enhanced flow: rexmits / ASN
  • 39. Retransmits enhanced flow: TCP latency / ASN
  • 40. You shouldn’t have to stare at dashboards or watch logs to detect badness Monitor top-x of any dimension combination (IP, ASN’s, Geo, Interface) Create baselines based on time of day Be able to look at things beyond pps/bps such as retransmits, latency, logs Detect shifts: did an ASN or IP on a particular interface suddenly move from top-x #200 to #2 and that is unusual for this time of day This is available today (Open Source: Hadoop, Spark, Storm, Samza, Flink) Building on top of the platform
  • 41. Use case: anomaly detection Traffic from one ASN (network) unusually high. Operator notified at red line.
  • 42. Use case: traffic anomaly detection & annotation
  • 43. Use case: traffic annotated w/ multiple events
  • 44. Anomaly detection: DDoS detection & characteristics
  • 45. Once you have a platform, what’s next? üAugmented flow (retransmits, latency, URL, DNS) üAnomaly detection q Multi-hop exit determination Challenging to map traffic from ingest to exit point, multi-hop q BGP-path congestion detection Detect individual congested paths within a circuit that isn’t congested Building on top of the platform
  • 46. Summary Networks can produce large amounts of data that will make your life easier Big Data platforms are able to consume this data Specific tools for Network Operators are beginning to appear (free & paid) Paid tools are more specific to network use (UI, easy setup, etc) Free tools have the “power” but require cobbling together pieces Much work to be done re fusing data such as logs, changes, alerts, DNS SaaS providers will provide community views and enable data-sharing