SlideShare a Scribd company logo
1 of 26
Download to read offline
Building a Dynamic DSL Rules
Engine with Kafka Streams
Will LaForest
Field CTO
Michael Peacock
Field Engineer
2
Why Something Other Than These
FRAMEWORKS
ksqlDB
Stream Processing Frameworks General Purpose
3
Stream processing very powerful
BUT
● Uses GPLs (Java, Python) or SQL
● 99.2% of workforce are NOT
coders
● Maybe SQL is easier
○ Still general purpose
○ Impractical for some
domains
99% 1%
• Volume of rules high? 100s or
1000s?
• If each rule is a stream
processing job
• Each a consumer and
producer
• Same network IO 1000x
• Process overhead
• Serialization/Deserialization
Technical Challenges
4
Cartoon style. Gnome sees something gross and vomits
Technical Challenges
5
• Multiple dynamic output
topics?
• Change rate of ruleset high?
• Minimize time to to
processing?
• Must be dynamic
• Don’t compile and deploy
• Don’t create a new stream
processor
• Don’t restart topology if
possible
Anyone Know what this is?
6
to spiral
make "n 1
while [:n < 100] [
make "n :n + 5
fd :n rt 90
]
end
Domain Specific Languages
Purpose built for a domain
● HTML web pages
● Regex for patterns
● MATLAB for numerical analysis
● GraphQL - Querying APIs
● Sigma for cyber and log data
If serialized as JSON, YAML, XML
● DSL as Data
● Apply data oriented operations
● Validate, secure, govern as data
● Can be transformed easily
● Generated from domain specific tool
● Dynamic
The Cybersecurity Domain
A Typical SIEM Platform Architecture
Agents /
Collectors
Reports and
alerts
Search and
investigate
Monitor and
analyze
SIEM
Tool
Network traffic
Firewall logs
RDBMS
Application logs
HTTP proxy logs
Forensic
Archive
AI/ML
Architecture with Kafka
Curate
Real-time Detection
APP SIEM
Index
SOAR
HDFS
S3
Big Query
Syslog
Network traffic
Firewall logs
RDBMS
Application logs
HTTP proxy logs
QRadar
Arcsight
Splunk
Machine Data
spooldir (files), SNMP Traps,
Databases, Sftp, MQs, S2S
Elastic
Confluent Sigma
https://github.com/confluentinc/confluent-sigma
What is Sigma?
12
https://github.com/SigmaHQ/sigma
● Sigma is an open signature (patterns) format
● Focused on log and network
● Researchers or analysts can describe developed
detection patterns
● Shareable with others (IMPORTANT)
● Rules schema agnostic
● Platform agnostic
○ Same rule can be be used for
○ Elastic, Splunk, Azure Sentinel, (now Kafka)
Confluent Sigma
13
Sigma Rules
(YAML)
Source
Data
Sigma Stream
Processor
SIEM
Applications
Cold storage
High Risk Detections
SOAR
Filtered
Cold retention
Data
Rules
Why Kafka Streams API?
14
● Simple to build a topology
● Easy to run as app
● No required runtime
● Start small easily
App 1
● Scale massively
● All you need is Kafka
● Can run anywhere!
A Walkthrough of the Sigma
Project
A Walkthrough of the Sigma Project
16
sigma-parser
sigma-streams
sigma-streams-ui
https://github.com/confluentinc/confluent-sigma
Sigma Stream Topologies
Simple Topology
supports one
sub-topology to
many rules
Aggregate Topology
requires one
sub-topology for
each rule
Simple Topology
18
flatMapValues KStream
iterate through each rule for the
processor (product/service
filtering)
validates the streaming data
against the DSL rule and add the
results to the output list if a
match is found
optional config variable to return
after the first match or return all
matches
dynamic output topic flatMapValues
Create a new KStream by transforming the value of
each record in this stream into zero or more
values with the same key in the new stream.
{
"@stream": "dns",
"@system": "bobs.bigwheel.local",
"@proc": "zeek",
"ts": 1588205199.82437,
"uid": "Cvf4XX17hSAgXDdGEd",
"id_orig_h": "10.0.1.6",
"id_orig_p": 54243,
"id_resp_h": "10.0.0.4",
"id_resp_p": 53,
"proto": "udp",
"trans_id": 41180,
"rtt": 0.001528024673461914,
"query": "newyork.dmevals.local",
"qclass": 1,
"qclass_name": "C_INTERNET",
"qtype": 1,
Sigma Rule
Streaming Data
1
2
3
4
5
Detection Results / Dynamic Output Topic
19
'^(?<timestamp>w{3}sd{2}sd{2}:d{2}:d{2})s(?< hostname>[^s]+)
s%ASA-d-(?< messageID>[^:]+):s(?< action>[^s]+)s(?< protocol>[^s]+)
ssrcsinside:(?< src>[0-9.]+)/(?< srcport>[0-9]+)sdstsoutside:(?< d
est>[0-9.]+)/(?< destport>[0-9]+)'
Sigma Rule
firewalls topic
adds the record data and rule
metadata
if the rule contains a regex with
mapping, adds the mapped
fields
adds any custom fields
dynamic output topic
1
2
3
4
Aggregate Topology
20
iterate through each rule for the
processor (product/service
filtering)
validates the streaming data
against the DSL rule
groups by the key
sliding window from rule
counts number of instances
within the window
filters based on aggregation and
operation in rule
dynamic output topic
{
"@stream": "dns",
"@system":
"bobs.bigwheel.local",
"@proc": "zeek",
"ts": 1588205199.82437,
"uid": "Cvf4XX17hSAgXDdGEd",
"id_orig_h": "10.0.1.6",
"id_orig_p": 54243,
"id_resp_h": "10.0.0.4",
"id_resp_p": 53,
"proto": "udp",
"trans_id": 41180,
"rtt": 0.001528024673461914,
"query":
"newyork.dmevals.local",
"qclass": 1,
"qclass_name": "C_INTERNET",
"qtype": 1,
Sigma Rule
Streaming Data
1
3
5
6
7
4
2
Confluent Sigma Demo
Sigma Stream Processors
Sigma Streams UI
Sigma Rule Editor
sigma rules topic
DNS
dns
detections
topic
dns topic
rule parsing,
filtering,
aggregation,
windowing
sigma
rules
cache
CONN
DHCP
HTTP
SSL
x509
Zeek Data
Demo
Considerations for DSL Stream Processing
● Consider how to handle newly arriving rules
○ Apply only to unprocessed messages
○ What about previous data?
■ Spawn separate app or topology
■ When “caught up” merge into single
● Versioned state stores now allow new strategy
○ Only join rules available for corresponding record time
○ For records only join against rules that existed at record time
■ Upcoming KIP-960, KIP-968, KIP-969 for our customized joins
23
https://www.confluent.io/blog/introducing-versioned-state-store-in-kafka-streams/
Considerations Continued
● Consider how routines may interact with each other
○ Sigma doesn’t support notion of priority or dependence
● For a DSL in which rules interact (IFTTT)
○ Use Processor API to map to an appropriate DAG
24
Whats Next?
“Roadmap” https://github.com/confluentinc/confluent-sigma/projects/1
● Refactoring framework to make DSL pluggable (on going)
○ Implement a builder class: DSL routines -> Kafka Streams topology
○ Re-use all the scaffolding we have built
○ LLM stream processing
● Optimizations
○ Aggregate topology optimizations (bin based on time and key)
○ More graceful management of aggregate topology changes
■ Self coordination
○ General profiling and optimization
● Switch to GlobalKTable from KCache 25
Building a Dynamic Rules Engine with Kafka Streams

More Related Content

What's hot

MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
Databricks
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
Weaveworks
 

What's hot (20)

Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and LogstashKeeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
 
AWS January 2016 Webinar Series - Managing your Infrastructure as Code
AWS January 2016 Webinar Series - Managing your Infrastructure as CodeAWS January 2016 Webinar Series - Managing your Infrastructure as Code
AWS January 2016 Webinar Series - Managing your Infrastructure as Code
 
Microsoft Azure Overview Infographic
Microsoft Azure Overview InfographicMicrosoft Azure Overview Infographic
Microsoft Azure Overview Infographic
 
Scaling Security Threat Detection with Apache Spark and Databricks
Scaling Security Threat Detection with Apache Spark and DatabricksScaling Security Threat Detection with Apache Spark and Databricks
Scaling Security Threat Detection with Apache Spark and Databricks
 
Project calico - introduction
Project calico - introductionProject calico - introduction
Project calico - introduction
 
Introduction to Azure
Introduction to AzureIntroduction to Azure
Introduction to Azure
 
AWS Security Hub Deep Dive
AWS Security Hub Deep DiveAWS Security Hub Deep Dive
AWS Security Hub Deep Dive
 
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteAdvanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
 
Why Micro Focus Chose Pulsar for Data Ingestion - Pulsar Summit NA 2021
Why Micro Focus Chose Pulsar for Data Ingestion - Pulsar Summit NA 2021Why Micro Focus Chose Pulsar for Data Ingestion - Pulsar Summit NA 2021
Why Micro Focus Chose Pulsar for Data Ingestion - Pulsar Summit NA 2021
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
 
quick intro to elastic search
quick intro to elastic search quick intro to elastic search
quick intro to elastic search
 
Azure Pipelines
Azure PipelinesAzure Pipelines
Azure Pipelines
 
DevSecOps
DevSecOpsDevSecOps
DevSecOps
 
CI/CD on Google Cloud Platform
CI/CD on Google Cloud PlatformCI/CD on Google Cloud Platform
CI/CD on Google Cloud Platform
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache Airflow
 
AWS VS AZURE VS GCP.pptx
AWS VS AZURE VS GCP.pptxAWS VS AZURE VS GCP.pptx
AWS VS AZURE VS GCP.pptx
 
Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com
 
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC Federal
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC FederalKafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC Federal
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC Federal
 
(SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
(SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014(SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
(SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
 

Similar to Building a Dynamic Rules Engine with Kafka Streams

Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
DataStax
 
Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application  Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application
Apache Apex
 

Similar to Building a Dynamic Rules Engine with Kafka Streams (20)

Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application  Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
 
Data Stream Processing - Concepts and Frameworks
Data Stream Processing - Concepts and FrameworksData Stream Processing - Concepts and Frameworks
Data Stream Processing - Concepts and Frameworks
 
Cisco Connect Toronto 2017 - Model-driven Telemetry
Cisco Connect Toronto 2017 - Model-driven TelemetryCisco Connect Toronto 2017 - Model-driven Telemetry
Cisco Connect Toronto 2017 - Model-driven Telemetry
 
Real-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNKReal-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNK
 
WarsawITDays_ ApacheNiFi202
WarsawITDays_ ApacheNiFi202WarsawITDays_ ApacheNiFi202
WarsawITDays_ ApacheNiFi202
 
Hyperledger 구조 분석
Hyperledger 구조 분석Hyperledger 구조 분석
Hyperledger 구조 분석
 
FIWARE Global Summit - Fast RTPS: Programming with the Default middleware for...
FIWARE Global Summit - Fast RTPS: Programming with the Default middleware for...FIWARE Global Summit - Fast RTPS: Programming with the Default middleware for...
FIWARE Global Summit - Fast RTPS: Programming with the Default middleware for...
 
Fast RTPS
Fast RTPSFast RTPS
Fast RTPS
 
Data Pipelines and Telephony Fraud Detection Using Machine Learning
Data Pipelines and Telephony Fraud Detection Using Machine Learning Data Pipelines and Telephony Fraud Detection Using Machine Learning
Data Pipelines and Telephony Fraud Detection Using Machine Learning
 
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
Muhammad Subhan
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 

Recently uploaded (20)

Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 

Building a Dynamic Rules Engine with Kafka Streams

  • 1. Building a Dynamic DSL Rules Engine with Kafka Streams Will LaForest Field CTO Michael Peacock Field Engineer
  • 2. 2 Why Something Other Than These FRAMEWORKS ksqlDB
  • 3. Stream Processing Frameworks General Purpose 3 Stream processing very powerful BUT ● Uses GPLs (Java, Python) or SQL ● 99.2% of workforce are NOT coders ● Maybe SQL is easier ○ Still general purpose ○ Impractical for some domains 99% 1%
  • 4. • Volume of rules high? 100s or 1000s? • If each rule is a stream processing job • Each a consumer and producer • Same network IO 1000x • Process overhead • Serialization/Deserialization Technical Challenges 4 Cartoon style. Gnome sees something gross and vomits
  • 5. Technical Challenges 5 • Multiple dynamic output topics? • Change rate of ruleset high? • Minimize time to to processing? • Must be dynamic • Don’t compile and deploy • Don’t create a new stream processor • Don’t restart topology if possible
  • 6. Anyone Know what this is? 6 to spiral make "n 1 while [:n < 100] [ make "n :n + 5 fd :n rt 90 ] end
  • 7. Domain Specific Languages Purpose built for a domain ● HTML web pages ● Regex for patterns ● MATLAB for numerical analysis ● GraphQL - Querying APIs ● Sigma for cyber and log data If serialized as JSON, YAML, XML ● DSL as Data ● Apply data oriented operations ● Validate, secure, govern as data ● Can be transformed easily ● Generated from domain specific tool ● Dynamic
  • 9. A Typical SIEM Platform Architecture Agents / Collectors Reports and alerts Search and investigate Monitor and analyze SIEM Tool Network traffic Firewall logs RDBMS Application logs HTTP proxy logs
  • 10. Forensic Archive AI/ML Architecture with Kafka Curate Real-time Detection APP SIEM Index SOAR HDFS S3 Big Query Syslog Network traffic Firewall logs RDBMS Application logs HTTP proxy logs QRadar Arcsight Splunk Machine Data spooldir (files), SNMP Traps, Databases, Sftp, MQs, S2S Elastic
  • 12. What is Sigma? 12 https://github.com/SigmaHQ/sigma ● Sigma is an open signature (patterns) format ● Focused on log and network ● Researchers or analysts can describe developed detection patterns ● Shareable with others (IMPORTANT) ● Rules schema agnostic ● Platform agnostic ○ Same rule can be be used for ○ Elastic, Splunk, Azure Sentinel, (now Kafka)
  • 13. Confluent Sigma 13 Sigma Rules (YAML) Source Data Sigma Stream Processor SIEM Applications Cold storage High Risk Detections SOAR Filtered Cold retention Data Rules
  • 14. Why Kafka Streams API? 14 ● Simple to build a topology ● Easy to run as app ● No required runtime ● Start small easily App 1 ● Scale massively ● All you need is Kafka ● Can run anywhere!
  • 15. A Walkthrough of the Sigma Project
  • 16. A Walkthrough of the Sigma Project 16 sigma-parser sigma-streams sigma-streams-ui https://github.com/confluentinc/confluent-sigma
  • 17. Sigma Stream Topologies Simple Topology supports one sub-topology to many rules Aggregate Topology requires one sub-topology for each rule
  • 18. Simple Topology 18 flatMapValues KStream iterate through each rule for the processor (product/service filtering) validates the streaming data against the DSL rule and add the results to the output list if a match is found optional config variable to return after the first match or return all matches dynamic output topic flatMapValues Create a new KStream by transforming the value of each record in this stream into zero or more values with the same key in the new stream. { "@stream": "dns", "@system": "bobs.bigwheel.local", "@proc": "zeek", "ts": 1588205199.82437, "uid": "Cvf4XX17hSAgXDdGEd", "id_orig_h": "10.0.1.6", "id_orig_p": 54243, "id_resp_h": "10.0.0.4", "id_resp_p": 53, "proto": "udp", "trans_id": 41180, "rtt": 0.001528024673461914, "query": "newyork.dmevals.local", "qclass": 1, "qclass_name": "C_INTERNET", "qtype": 1, Sigma Rule Streaming Data 1 2 3 4 5
  • 19. Detection Results / Dynamic Output Topic 19 '^(?<timestamp>w{3}sd{2}sd{2}:d{2}:d{2})s(?< hostname>[^s]+) s%ASA-d-(?< messageID>[^:]+):s(?< action>[^s]+)s(?< protocol>[^s]+) ssrcsinside:(?< src>[0-9.]+)/(?< srcport>[0-9]+)sdstsoutside:(?< d est>[0-9.]+)/(?< destport>[0-9]+)' Sigma Rule firewalls topic adds the record data and rule metadata if the rule contains a regex with mapping, adds the mapped fields adds any custom fields dynamic output topic 1 2 3 4
  • 20. Aggregate Topology 20 iterate through each rule for the processor (product/service filtering) validates the streaming data against the DSL rule groups by the key sliding window from rule counts number of instances within the window filters based on aggregation and operation in rule dynamic output topic { "@stream": "dns", "@system": "bobs.bigwheel.local", "@proc": "zeek", "ts": 1588205199.82437, "uid": "Cvf4XX17hSAgXDdGEd", "id_orig_h": "10.0.1.6", "id_orig_p": 54243, "id_resp_h": "10.0.0.4", "id_resp_p": 53, "proto": "udp", "trans_id": 41180, "rtt": 0.001528024673461914, "query": "newyork.dmevals.local", "qclass": 1, "qclass_name": "C_INTERNET", "qtype": 1, Sigma Rule Streaming Data 1 3 5 6 7 4 2
  • 21. Confluent Sigma Demo Sigma Stream Processors Sigma Streams UI Sigma Rule Editor sigma rules topic DNS dns detections topic dns topic rule parsing, filtering, aggregation, windowing sigma rules cache CONN DHCP HTTP SSL x509 Zeek Data
  • 22. Demo
  • 23. Considerations for DSL Stream Processing ● Consider how to handle newly arriving rules ○ Apply only to unprocessed messages ○ What about previous data? ■ Spawn separate app or topology ■ When “caught up” merge into single ● Versioned state stores now allow new strategy ○ Only join rules available for corresponding record time ○ For records only join against rules that existed at record time ■ Upcoming KIP-960, KIP-968, KIP-969 for our customized joins 23 https://www.confluent.io/blog/introducing-versioned-state-store-in-kafka-streams/
  • 24. Considerations Continued ● Consider how routines may interact with each other ○ Sigma doesn’t support notion of priority or dependence ● For a DSL in which rules interact (IFTTT) ○ Use Processor API to map to an appropriate DAG 24
  • 25. Whats Next? “Roadmap” https://github.com/confluentinc/confluent-sigma/projects/1 ● Refactoring framework to make DSL pluggable (on going) ○ Implement a builder class: DSL routines -> Kafka Streams topology ○ Re-use all the scaffolding we have built ○ LLM stream processing ● Optimizations ○ Aggregate topology optimizations (bin based on time and key) ○ More graceful management of aggregate topology changes ■ Self coordination ○ General profiling and optimization ● Switch to GlobalKTable from KCache 25