SlideShare a Scribd company logo
1 of 26
Download to read offline
THE POWER OF
INTELLIGENT FLOWS
REAL-TIME IOT BOTNET CLASSIFICATION WITH APACHE NIFI
Andre Fucs de Miranda - Fluenda
Andy LoPresto - Hortonworks
Agenda
- Who are the two blokes in front of you
- A brief prologue
- Logs! Logs! Logs!
- The challenge
- The solution
- Wrapping up
Who are the two blokes in front of you
Andre Fucs de Miranda



- Nearly 20 years working with
information cyber security



- Logging aficionado (i.e. security
data engineer)



- Apache NiFi PMC Member



@trixpan @trixpan
Andy LoPresto



- Financial security & device
firmware at Apple, TigerText, etc.



- PII, PCI & EPHI encryption &
cracking



- Apache NiFi PMC Member



@yolopey @alopresto
A brief prologue
The Botnet Kill Chain & the Honeypot
Reconnaissanc
e
Weaponization Delivery Exploitation Installation C & C
Actions on
Objective
Botnet

developer
Low /
Medium
interaction
honeypot

High
interaction 

--

Sandboxing
The Botnet Kill Chain & the Honeypot
Delivery Exploitation Installation
Step 1
Logon to
system
Step 2
Execute
predefined
sequence of
commands
Step 3
Try to install
some sort of
persistence
The demo environment
- A handful of EC2 instances
running:
- Cowrie - Medium interaction
SSH / Telnet honeypot
- MiNiFi

- An EC2 instance running:
- NiFi 1.3.0 (with security
enabled)
Flow Design Approach
" Don’t be prescriptive
" Treat everything as data
" Don’t be limited by prior
expectations
" Start from the end
Logs! Logs! Logs!
MiNiFi Process Group
" Tailing a log file being written by
cowrie
" Pushing to Amazon S3
○ Could stream via NiFi Site
to Site
○ MiNiFi extensibility
○ Shows multiple capabilities
○ Decoupled/no lock in
The data being ingested
- Cowrie logs include:
- Username / Password
- Commands executed (and parameters)
- Files downloaded
- Single line JSON entries
- Easy to parse
- Textbook machine readable log format
- Perfect match to NiFi processors such as:
- SplitText
- EvaluateJSONPath
Cowrie log example
Simple Cowrie log ingestion with NiFi
The challenge
The challenge
The challenge
- Logs in isolation rarely will provide the reader with a
meaningful view over what is happening

- Verbosity means sensors generate lots of “events”, but
who cares about a bot trying to `cat /proc/mounts` ?

- Bots use semi-random values to make detection more
difficult.
The solution
Logs are data too...
This looks familiar...
Locality-sensitive hashing
A type of algorithm that can be used to “group” similar items together
and may provide a similarity score between two particular items.
Areas of application:
- Genome-wide association study
- Anti-spam (e.g. TLSH, Spamsum/SSDeep)
- Near-duplicate detection
- etc
NiFi + SpamSum + TLSH = WIN!
Wrapping up
Key points
- Treat everything as data
- Be flexible on how you build your data flows.
- Apparently unrelated domains may speed up your results
- Use MiNiFi to aggregate data at the edge whenever possible
- NiFi rocks!*

* Disclaimer: We may be a bit
Future Steps
" Automate IP blocking & firewall rules (ML)
" Continuously update signature definition list with new
sigs
" Analyze epidemiology & spread vectors
" Follow evolution of malware families
" Support attribution of samples
Further reading
Mysterious Hajime botnet has pwned 300,000 IoT devices

https://www.theregister.co.uk/2017/04/27/hajime_iot_botnet/
Identifying unknown files by using fuzzy hashing

https://www.honeynet.org/node/811
Classifying Malware using Import API and Fuzzy Hashing – impfuzzy

http://blog.jpcert.or.jp/2016/05/classifying-mal-a988.html
Template and samples:

https://github.com/fluenda/dataworks_summit_iot_botnet
Thank you

More Related Content

What's hot

LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BIDataWorks Summit
 
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...DataWorks Summit
 
Building Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBuilding Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBryan Bende
 
IoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFiIoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFiDataWorks Summit
 
Storage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesStorage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesDataWorks Summit
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataWorks Summit
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0DataWorks Summit
 
Devnexus 2018 - Let Your Data Flow with Apache NiFi
Devnexus 2018 - Let Your Data Flow with Apache NiFiDevnexus 2018 - Let Your Data Flow with Apache NiFi
Devnexus 2018 - Let Your Data Flow with Apache NiFiBryan Bende
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer GuideDeon Huang
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks
 
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightZero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightDataWorks Summit
 
Seattle spark-meetup-032317
Seattle spark-meetup-032317Seattle spark-meetup-032317
Seattle spark-meetup-032317Nan Zhu
 
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and ParquetBig Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and ParquetDataWorks Summit
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsTimothy Spann
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFiHortonworks
 

What's hot (20)

LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BI
 
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
 
Building Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBuilding Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFi
 
IoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFiIoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFi
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
From Device to Data Center to Insights
From Device to Data Center to InsightsFrom Device to Data Center to Insights
From Device to Data Center to Insights
 
Storage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesStorage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on Kubernetes
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
 
Devnexus 2018 - Let Your Data Flow with Apache NiFi
Devnexus 2018 - Let Your Data Flow with Apache NiFiDevnexus 2018 - Let Your Data Flow with Apache NiFi
Devnexus 2018 - Let Your Data Flow with Apache NiFi
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer Guide
 
Building a Smarter Home with Apache NiFi and Spark
Building a Smarter Home with Apache NiFi and SparkBuilding a Smarter Home with Apache NiFi and Spark
Building a Smarter Home with Apache NiFi and Spark
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
 
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightZero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsight
 
Seattle spark-meetup-032317
Seattle spark-meetup-032317Seattle spark-meetup-032317
Seattle spark-meetup-032317
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and ParquetBig Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFi
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
 

Similar to Real-Time IoT Botnet Classification with Apache NiFi and Locality-Sensitive Hashing

iShelf - Inventory management project
iShelf - Inventory management projectiShelf - Inventory management project
iShelf - Inventory management projectFrancescoCassini
 
Basics of tcp ip
Basics of tcp ipBasics of tcp ip
Basics of tcp ipKumar
 
Kamailio World 2018: Having fun with new stuff
Kamailio World 2018: Having fun with new stuffKamailio World 2018: Having fun with new stuff
Kamailio World 2018: Having fun with new stuffOlle E Johansson
 
Teensy Programming for Everyone
Teensy Programming for EveryoneTeensy Programming for Everyone
Teensy Programming for EveryoneNikhil Mittal
 
Bsides Tampa Blue Team’s tool dump.
Bsides Tampa Blue Team’s tool dump.Bsides Tampa Blue Team’s tool dump.
Bsides Tampa Blue Team’s tool dump.Alexander Kot
 
Lares from LOW to PWNED
Lares from LOW to PWNEDLares from LOW to PWNED
Lares from LOW to PWNEDChris Gates
 
Detecting Web Browser Heap Corruption Attacks - Stephan Chenette, Moti Joseph...
Detecting Web Browser Heap Corruption Attacks - Stephan Chenette, Moti Joseph...Detecting Web Browser Heap Corruption Attacks - Stephan Chenette, Moti Joseph...
Detecting Web Browser Heap Corruption Attacks - Stephan Chenette, Moti Joseph...Stephan Chenette
 
Jaime Blasco - Fighting Advanced Persistent Threat (APT) with Open Source Too...
Jaime Blasco - Fighting Advanced Persistent Threat (APT) with Open Source Too...Jaime Blasco - Fighting Advanced Persistent Threat (APT) with Open Source Too...
Jaime Blasco - Fighting Advanced Persistent Threat (APT) with Open Source Too...RootedCON
 
Honeypots.ppt1800363876
Honeypots.ppt1800363876Honeypots.ppt1800363876
Honeypots.ppt1800363876Momita Sharma
 
MachinePulse at the November Open Hardware Meetup, Mumbai 2014
MachinePulse at the November Open Hardware Meetup, Mumbai 2014MachinePulse at the November Open Hardware Meetup, Mumbai 2014
MachinePulse at the November Open Hardware Meetup, Mumbai 2014MachinePulse
 
DEF CON 27 - DANIEL ROMERO and MARIO RIVAS - why you should fear your mundane...
DEF CON 27 - DANIEL ROMERO and MARIO RIVAS - why you should fear your mundane...DEF CON 27 - DANIEL ROMERO and MARIO RIVAS - why you should fear your mundane...
DEF CON 27 - DANIEL ROMERO and MARIO RIVAS - why you should fear your mundane...Felipe Prado
 
Shmoocon 2013 - OpenStack Security Brief
Shmoocon 2013 - OpenStack Security BriefShmoocon 2013 - OpenStack Security Brief
Shmoocon 2013 - OpenStack Security Briefopenfly
 
What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)Brian Brazil
 

Similar to Real-Time IoT Botnet Classification with Apache NiFi and Locality-Sensitive Hashing (20)

iShelf - Inventory management project
iShelf - Inventory management projectiShelf - Inventory management project
iShelf - Inventory management project
 
The LabRat - Physical backdoor hacks and IOT primer
The LabRat - Physical backdoor hacks and IOT primerThe LabRat - Physical backdoor hacks and IOT primer
The LabRat - Physical backdoor hacks and IOT primer
 
Data analysis with pandas
Data analysis with pandasData analysis with pandas
Data analysis with pandas
 
Data Analysis With Pandas
Data Analysis With PandasData Analysis With Pandas
Data Analysis With Pandas
 
Routing_Article
Routing_ArticleRouting_Article
Routing_Article
 
Basics of tcp ip
Basics of tcp ipBasics of tcp ip
Basics of tcp ip
 
Kamailio World 2018: Having fun with new stuff
Kamailio World 2018: Having fun with new stuffKamailio World 2018: Having fun with new stuff
Kamailio World 2018: Having fun with new stuff
 
Teensy Programming for Everyone
Teensy Programming for EveryoneTeensy Programming for Everyone
Teensy Programming for Everyone
 
OS Fingerprinting
OS FingerprintingOS Fingerprinting
OS Fingerprinting
 
Bsides Tampa Blue Team’s tool dump.
Bsides Tampa Blue Team’s tool dump.Bsides Tampa Blue Team’s tool dump.
Bsides Tampa Blue Team’s tool dump.
 
2015 moloch recipes
2015 moloch recipes2015 moloch recipes
2015 moloch recipes
 
Lares from LOW to PWNED
Lares from LOW to PWNEDLares from LOW to PWNED
Lares from LOW to PWNED
 
Detecting Web Browser Heap Corruption Attacks - Stephan Chenette, Moti Joseph...
Detecting Web Browser Heap Corruption Attacks - Stephan Chenette, Moti Joseph...Detecting Web Browser Heap Corruption Attacks - Stephan Chenette, Moti Joseph...
Detecting Web Browser Heap Corruption Attacks - Stephan Chenette, Moti Joseph...
 
Jaime Blasco - Fighting Advanced Persistent Threat (APT) with Open Source Too...
Jaime Blasco - Fighting Advanced Persistent Threat (APT) with Open Source Too...Jaime Blasco - Fighting Advanced Persistent Threat (APT) with Open Source Too...
Jaime Blasco - Fighting Advanced Persistent Threat (APT) with Open Source Too...
 
IoT 2.pptx
IoT 2.pptxIoT 2.pptx
IoT 2.pptx
 
Honeypots.ppt1800363876
Honeypots.ppt1800363876Honeypots.ppt1800363876
Honeypots.ppt1800363876
 
MachinePulse at the November Open Hardware Meetup, Mumbai 2014
MachinePulse at the November Open Hardware Meetup, Mumbai 2014MachinePulse at the November Open Hardware Meetup, Mumbai 2014
MachinePulse at the November Open Hardware Meetup, Mumbai 2014
 
DEF CON 27 - DANIEL ROMERO and MARIO RIVAS - why you should fear your mundane...
DEF CON 27 - DANIEL ROMERO and MARIO RIVAS - why you should fear your mundane...DEF CON 27 - DANIEL ROMERO and MARIO RIVAS - why you should fear your mundane...
DEF CON 27 - DANIEL ROMERO and MARIO RIVAS - why you should fear your mundane...
 
Shmoocon 2013 - OpenStack Security Brief
Shmoocon 2013 - OpenStack Security BriefShmoocon 2013 - OpenStack Security Brief
Shmoocon 2013 - OpenStack Security Brief
 
What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Real-Time IoT Botnet Classification with Apache NiFi and Locality-Sensitive Hashing

  • 1. THE POWER OF INTELLIGENT FLOWS REAL-TIME IOT BOTNET CLASSIFICATION WITH APACHE NIFI Andre Fucs de Miranda - Fluenda Andy LoPresto - Hortonworks
  • 2. Agenda - Who are the two blokes in front of you - A brief prologue - Logs! Logs! Logs! - The challenge - The solution - Wrapping up
  • 3. Who are the two blokes in front of you Andre Fucs de Miranda
 
 - Nearly 20 years working with information cyber security
 
 - Logging aficionado (i.e. security data engineer)
 
 - Apache NiFi PMC Member
 
 @trixpan @trixpan Andy LoPresto
 
 - Financial security & device firmware at Apple, TigerText, etc.
 
 - PII, PCI & EPHI encryption & cracking
 
 - Apache NiFi PMC Member
 
 @yolopey @alopresto
  • 5. The Botnet Kill Chain & the Honeypot Reconnaissanc e Weaponization Delivery Exploitation Installation C & C Actions on Objective Botnet
 developer Low / Medium interaction honeypot
 High interaction 
 --
 Sandboxing
  • 6. The Botnet Kill Chain & the Honeypot Delivery Exploitation Installation Step 1 Logon to system Step 2 Execute predefined sequence of commands Step 3 Try to install some sort of persistence
  • 7. The demo environment - A handful of EC2 instances running: - Cowrie - Medium interaction SSH / Telnet honeypot - MiNiFi
 - An EC2 instance running: - NiFi 1.3.0 (with security enabled)
  • 8. Flow Design Approach " Don’t be prescriptive " Treat everything as data " Don’t be limited by prior expectations " Start from the end
  • 10. MiNiFi Process Group " Tailing a log file being written by cowrie " Pushing to Amazon S3 ○ Could stream via NiFi Site to Site ○ MiNiFi extensibility ○ Shows multiple capabilities ○ Decoupled/no lock in
  • 11. The data being ingested - Cowrie logs include: - Username / Password - Commands executed (and parameters) - Files downloaded - Single line JSON entries - Easy to parse - Textbook machine readable log format - Perfect match to NiFi processors such as: - SplitText - EvaluateJSONPath
  • 13. Simple Cowrie log ingestion with NiFi
  • 16. The challenge - Logs in isolation rarely will provide the reader with a meaningful view over what is happening
 - Verbosity means sensors generate lots of “events”, but who cares about a bot trying to `cat /proc/mounts` ?
 - Bots use semi-random values to make detection more difficult.
  • 18. Logs are data too...
  • 20. Locality-sensitive hashing A type of algorithm that can be used to “group” similar items together and may provide a similarity score between two particular items. Areas of application: - Genome-wide association study - Anti-spam (e.g. TLSH, Spamsum/SSDeep) - Near-duplicate detection - etc
  • 21. NiFi + SpamSum + TLSH = WIN!
  • 23. Key points - Treat everything as data - Be flexible on how you build your data flows. - Apparently unrelated domains may speed up your results - Use MiNiFi to aggregate data at the edge whenever possible - NiFi rocks!*
 * Disclaimer: We may be a bit
  • 24. Future Steps " Automate IP blocking & firewall rules (ML) " Continuously update signature definition list with new sigs " Analyze epidemiology & spread vectors " Follow evolution of malware families " Support attribution of samples
  • 25. Further reading Mysterious Hajime botnet has pwned 300,000 IoT devices
 https://www.theregister.co.uk/2017/04/27/hajime_iot_botnet/ Identifying unknown files by using fuzzy hashing
 https://www.honeynet.org/node/811 Classifying Malware using Import API and Fuzzy Hashing – impfuzzy
 http://blog.jpcert.or.jp/2016/05/classifying-mal-a988.html Template and samples:
 https://github.com/fluenda/dataworks_summit_iot_botnet