SlideShare una empresa de Scribd logo
1 de 25
Anomaly Detection Engine for
Cloud Activities using Flink
Flink Forward Berlin 2018
Yonatan Most & Avihai Berkovitz
Microsoft Cloud App Security
Microsoft Cloud App Security
Discover and
assess risks
Control access
in real time
Detect
threats
Protect your
information
Identify cloud apps on your
network, gain visibility into shadow
IT, and get risk assessments and
ongoing analytics.
Manage and limit cloud app
access based on conditions and
session context, including user
identity, device, and location.
Identify high-risk usage and
detect unusual behavior using
Microsoft threat intelligence
and research.
Get granular control over data
and use built-in or custom
policies for data sharing and
data loss prevention.
Threat detection: Microsoft Intelligent Security Graph, Office ATP
Information Protection: Office 365 & Azure Information Protection
Identity: Azure AD and Conditional Access
To your cloud appsExtend Microsoft security
+ more
Microsoft Cloud App Security
• The primary data type in the product
• 16,000+ supported SaaS applications
• Dozens of activity types (logon, upload, export, create user…)
• Locations, devices, target objects, impersonated users…
• Out of order by up to 24 hours!
Cloud activities
Cloud activities
• Goal: detect anomalous user behavior and alert the admin
• The core flow is:
• Analyze all the activities and extract security-oriented insights
• Maintain a behavioral model for every user, and update it inline
• Detect outliers, suspicious behavior or potentially malicious activity
• Cross-reference with previous alerts and other users, to prevent false positives and “alert fatigue”
• Reach a decision and raise an alert within seconds
Activity-based threat protection
• Scalability
• Fault tolerance & recovery
• Real time processing
• Short time from ingestion to
detection
• Extendibility
Detection engine requirements
• Support for massive state size
• State persistency
• State consistency
• Fast, parallel access and update of
state
Sounds familiar?
• We needed a stateful stream processing framework
• We really didn’t want to build one in-house
• We tested 10 frameworks
• Azure ML, Azure Stream Analytics, Microsoft Orleans, Apache Storm, Apache Samza, Apache Spark streaming, Apache Ignite, Apache Beam…
• We chose Flink!
We embarked on a search for a framework
Anubis
• Our anomaly detection engine
• A single Flink job running the entire flow
• Ingests activities
• Outputs alerts
• 8 clusters across multiple datacenters
• Our largest cluster:
• 40 machines
• 25,000 events per second
• 1.3 TB of state
Anubis
Anubis
User
model
Group
model
Events
Anubis scalability
Scalability in… Requires…
Event rate per cluster Increase cluster as needed (stability cost)
Event rate per user group
Key by user where possible
Special treatment of group-keyed operators
Event rate per user Capping the user event rate
Features models and detections
Key by feature / model / detection +
increase cluster as needed
Number of clusters Easy cluster management
• Multiple clusters in multiple datacenters
• Custom standalone cluster setup
• Automation scripts
• Machine setup
• Flink version upgrade
• Job deployment and upgrade
• Kubernetes-based deployment solution is in progress
Flink @ Microsoft - cluster
• We use Azure EventHubs as sources and sinks
• We use Azure Blob Storage for state backend
Flink @ Microsoft – connectors
• The job and its state should last forever
• We also deploy new versions frequently and update the state objects
• We created a custom versioned framework based on Kryo
Flink @ Microsoft – upgrades
• Custom wrappers for operators, state access, serializers, collectors
• Adding log metadata about current operator, context, timing
• Adding metrics using the built-in framework
• Around 15,000 metrics per process!
• Everything flows to Splunk
• Investigation
• Dashboards
• Alerts
Flink @ Microsoft – monitoring
Flink @ Microsoft – monitoring
Flink @ Microsoft – monitoring
Flink @ Microsoft – monitoring
Flink @ Microsoft – monitoring
Flink @ Microsoft – monitoring
Flink @ Microsoft – monitoring
• Two other Flink jobs already in production
• Another two in development
• Kubernetes-based deployment solution is in progress
• Helping other groups within Microsoft build Flink jobs
What’s next?
© Copyright Microsoft Corporation. All rights reserved.
Thank you.
Questions?

Más contenido relacionado

La actualidad más candente

Palestra de abertura: Evolução e visão do Elastic Observability
Palestra de abertura: Evolução e visão do Elastic ObservabilityPalestra de abertura: Evolução e visão do Elastic Observability
Palestra de abertura: Evolução e visão do Elastic ObservabilityElasticsearch
 
Search for all with Elastic Enterprise Search
Search for all with Elastic Enterprise Search Search for all with Elastic Enterprise Search
Search for all with Elastic Enterprise Search Elasticsearch
 
How KeyBank Used Elastic to Build an Enterprise Monitoring Solution
How KeyBank Used Elastic to Build an Enterprise Monitoring SolutionHow KeyBank Used Elastic to Build an Enterprise Monitoring Solution
How KeyBank Used Elastic to Build an Enterprise Monitoring SolutionElasticsearch
 
Monitoring real-life Azure applications: When to use what and why
Monitoring real-life Azure applications: When to use what and whyMonitoring real-life Azure applications: When to use what and why
Monitoring real-life Azure applications: When to use what and whyKarl Ots
 
Automate Your Container Deployments Securely
Automate Your Container Deployments SecurelyAutomate Your Container Deployments Securely
Automate Your Container Deployments SecurelyDevOps.com
 
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic StackSiscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic StackElasticsearch
 
Infrastructure monitoring made easy, from ingest to insight
Infrastructure monitoring made easy, from ingest to insightInfrastructure monitoring made easy, from ingest to insight
Infrastructure monitoring made easy, from ingest to insightElasticsearch
 
Intelligent Cloud Conference 2018 - Building secure cloud applications with A...
Intelligent Cloud Conference 2018 - Building secure cloud applications with A...Intelligent Cloud Conference 2018 - Building secure cloud applications with A...
Intelligent Cloud Conference 2018 - Building secure cloud applications with A...Tom Kerkhove
 
Elasticsearch on Azure
Elasticsearch on AzureElasticsearch on Azure
Elasticsearch on AzureElasticsearch
 
The full picture of Openstack in real-time
The full picture of Openstack in real-timeThe full picture of Openstack in real-time
The full picture of Openstack in real-timeDynatrace
 
Monitoring
MonitoringMonitoring
Monitoringstrikr .
 
Elastic APM : développez vos logs et vos indicateurs pour obtenir une vue com...
Elastic APM : développez vos logs et vos indicateurs pour obtenir une vue com...Elastic APM : développez vos logs et vos indicateurs pour obtenir une vue com...
Elastic APM : développez vos logs et vos indicateurs pour obtenir une vue com...Elasticsearch
 
Elastic APM: amplificação dos seus logs e métricas para proporcionar um panor...
Elastic APM: amplificação dos seus logs e métricas para proporcionar um panor...Elastic APM: amplificação dos seus logs e métricas para proporcionar um panor...
Elastic APM: amplificação dos seus logs e métricas para proporcionar um panor...Elasticsearch
 
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)DevOps.com
 
Your Agile, Modern Data Delivery Platform
Your Agile, Modern Data Delivery PlatformYour Agile, Modern Data Delivery Platform
Your Agile, Modern Data Delivery Platformsyed_javed
 
What’s Evolving in the Elastic Stack
What’s Evolving in the Elastic StackWhat’s Evolving in the Elastic Stack
What’s Evolving in the Elastic StackElasticsearch
 
Modern Web-site Development Pipeline
Modern Web-site Development PipelineModern Web-site Development Pipeline
Modern Web-site Development PipelineGlobalLogic Ukraine
 
Migrating a legacy logging system: Etsy’s journey to Elastic Cloud
Migrating a legacy logging system: Etsy’s journey to Elastic CloudMigrating a legacy logging system: Etsy’s journey to Elastic Cloud
Migrating a legacy logging system: Etsy’s journey to Elastic CloudElasticsearch
 
Search for All with Elastic Enterprise Search
Search for All with Elastic Enterprise Search Search for All with Elastic Enterprise Search
Search for All with Elastic Enterprise Search Elasticsearch
 
Intelligent Cloud Conference 2018 - Next Generation of Data Integration with ...
Intelligent Cloud Conference 2018 - Next Generation of Data Integration with ...Intelligent Cloud Conference 2018 - Next Generation of Data Integration with ...
Intelligent Cloud Conference 2018 - Next Generation of Data Integration with ...Tom Kerkhove
 

La actualidad más candente (20)

Palestra de abertura: Evolução e visão do Elastic Observability
Palestra de abertura: Evolução e visão do Elastic ObservabilityPalestra de abertura: Evolução e visão do Elastic Observability
Palestra de abertura: Evolução e visão do Elastic Observability
 
Search for all with Elastic Enterprise Search
Search for all with Elastic Enterprise Search Search for all with Elastic Enterprise Search
Search for all with Elastic Enterprise Search
 
How KeyBank Used Elastic to Build an Enterprise Monitoring Solution
How KeyBank Used Elastic to Build an Enterprise Monitoring SolutionHow KeyBank Used Elastic to Build an Enterprise Monitoring Solution
How KeyBank Used Elastic to Build an Enterprise Monitoring Solution
 
Monitoring real-life Azure applications: When to use what and why
Monitoring real-life Azure applications: When to use what and whyMonitoring real-life Azure applications: When to use what and why
Monitoring real-life Azure applications: When to use what and why
 
Automate Your Container Deployments Securely
Automate Your Container Deployments SecurelyAutomate Your Container Deployments Securely
Automate Your Container Deployments Securely
 
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic StackSiscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
 
Infrastructure monitoring made easy, from ingest to insight
Infrastructure monitoring made easy, from ingest to insightInfrastructure monitoring made easy, from ingest to insight
Infrastructure monitoring made easy, from ingest to insight
 
Intelligent Cloud Conference 2018 - Building secure cloud applications with A...
Intelligent Cloud Conference 2018 - Building secure cloud applications with A...Intelligent Cloud Conference 2018 - Building secure cloud applications with A...
Intelligent Cloud Conference 2018 - Building secure cloud applications with A...
 
Elasticsearch on Azure
Elasticsearch on AzureElasticsearch on Azure
Elasticsearch on Azure
 
The full picture of Openstack in real-time
The full picture of Openstack in real-timeThe full picture of Openstack in real-time
The full picture of Openstack in real-time
 
Monitoring
MonitoringMonitoring
Monitoring
 
Elastic APM : développez vos logs et vos indicateurs pour obtenir une vue com...
Elastic APM : développez vos logs et vos indicateurs pour obtenir une vue com...Elastic APM : développez vos logs et vos indicateurs pour obtenir une vue com...
Elastic APM : développez vos logs et vos indicateurs pour obtenir une vue com...
 
Elastic APM: amplificação dos seus logs e métricas para proporcionar um panor...
Elastic APM: amplificação dos seus logs e métricas para proporcionar um panor...Elastic APM: amplificação dos seus logs e métricas para proporcionar um panor...
Elastic APM: amplificação dos seus logs e métricas para proporcionar um panor...
 
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
 
Your Agile, Modern Data Delivery Platform
Your Agile, Modern Data Delivery PlatformYour Agile, Modern Data Delivery Platform
Your Agile, Modern Data Delivery Platform
 
What’s Evolving in the Elastic Stack
What’s Evolving in the Elastic StackWhat’s Evolving in the Elastic Stack
What’s Evolving in the Elastic Stack
 
Modern Web-site Development Pipeline
Modern Web-site Development PipelineModern Web-site Development Pipeline
Modern Web-site Development Pipeline
 
Migrating a legacy logging system: Etsy’s journey to Elastic Cloud
Migrating a legacy logging system: Etsy’s journey to Elastic CloudMigrating a legacy logging system: Etsy’s journey to Elastic Cloud
Migrating a legacy logging system: Etsy’s journey to Elastic Cloud
 
Search for All with Elastic Enterprise Search
Search for All with Elastic Enterprise Search Search for All with Elastic Enterprise Search
Search for All with Elastic Enterprise Search
 
Intelligent Cloud Conference 2018 - Next Generation of Data Integration with ...
Intelligent Cloud Conference 2018 - Next Generation of Data Integration with ...Intelligent Cloud Conference 2018 - Next Generation of Data Integration with ...
Intelligent Cloud Conference 2018 - Next Generation of Data Integration with ...
 

Similar a Anomaly Detection Engine for Cloud Activities using Flink

ThroughTheLookingGlass_EffectiveObservability.pptx
ThroughTheLookingGlass_EffectiveObservability.pptxThroughTheLookingGlass_EffectiveObservability.pptx
ThroughTheLookingGlass_EffectiveObservability.pptxGrace Jansen
 
CyberCrime in the Cloud and How to defend Yourself
CyberCrime in the Cloud and How to defend Yourself CyberCrime in the Cloud and How to defend Yourself
CyberCrime in the Cloud and How to defend Yourself Alert Logic
 
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...Grid Dynamics
 
20140708 - Jeremy Edberg: How Netflix Delivers Software
20140708 - Jeremy Edberg: How Netflix Delivers Software20140708 - Jeremy Edberg: How Netflix Delivers Software
20140708 - Jeremy Edberg: How Netflix Delivers SoftwareDevOps Chicago
 
Presentacion de solucion cloud de navegacion segura
Presentacion de solucion cloud de navegacion seguraPresentacion de solucion cloud de navegacion segura
Presentacion de solucion cloud de navegacion seguraRogerChaucaZea
 
The Art of Container Monitoring
The Art of Container MonitoringThe Art of Container Monitoring
The Art of Container MonitoringDerek Chen
 
DCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at NetflixDCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at NetflixDocker, Inc.
 
NextGen Endpoint Security for Dummies
NextGen Endpoint Security for DummiesNextGen Endpoint Security for Dummies
NextGen Endpoint Security for DummiesAtif Ghauri
 
Alabama CyberNow 2018: Cloud Hardening and Digital Forensics Readiness
Alabama CyberNow 2018: Cloud Hardening and Digital Forensics ReadinessAlabama CyberNow 2018: Cloud Hardening and Digital Forensics Readiness
Alabama CyberNow 2018: Cloud Hardening and Digital Forensics ReadinessToni de la Fuente
 
The Joy of Proactive Security
The Joy of Proactive SecurityThe Joy of Proactive Security
The Joy of Proactive SecurityAndy Hoernecke
 
All Your Security Events Are Belong to ... You!
All Your Security Events Are Belong to ... You!All Your Security Events Are Belong to ... You!
All Your Security Events Are Belong to ... You!Xavier Mertens
 
366864108 azure-security
366864108 azure-security366864108 azure-security
366864108 azure-securityober64
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Data Con LA
 
Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...
Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...
Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...Codemotion
 
CodeMotion Amsterdam 2018 - Microservices in action at the Dutch National Police
CodeMotion Amsterdam 2018 - Microservices in action at the Dutch National PoliceCodeMotion Amsterdam 2018 - Microservices in action at the Dutch National Police
CodeMotion Amsterdam 2018 - Microservices in action at the Dutch National PoliceBert Jan Schrijver
 
Azure 101: Shared responsibility in the Azure Cloud
Azure 101: Shared responsibility in the Azure CloudAzure 101: Shared responsibility in the Azure Cloud
Azure 101: Shared responsibility in the Azure CloudPaulo Renato
 
Regulated Reactive - Security Considerations for Building Reactive Systems in...
Regulated Reactive - Security Considerations for Building Reactive Systems in...Regulated Reactive - Security Considerations for Building Reactive Systems in...
Regulated Reactive - Security Considerations for Building Reactive Systems in...Ryan Hodgin
 
Microsoft Azure Security Overview
Microsoft Azure Security OverviewMicrosoft Azure Security Overview
Microsoft Azure Security OverviewAlert Logic
 

Similar a Anomaly Detection Engine for Cloud Activities using Flink (20)

ThroughTheLookingGlass_EffectiveObservability.pptx
ThroughTheLookingGlass_EffectiveObservability.pptxThroughTheLookingGlass_EffectiveObservability.pptx
ThroughTheLookingGlass_EffectiveObservability.pptx
 
CyberCrime in the Cloud and How to defend Yourself
CyberCrime in the Cloud and How to defend Yourself CyberCrime in the Cloud and How to defend Yourself
CyberCrime in the Cloud and How to defend Yourself
 
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
 
20140708 - Jeremy Edberg: How Netflix Delivers Software
20140708 - Jeremy Edberg: How Netflix Delivers Software20140708 - Jeremy Edberg: How Netflix Delivers Software
20140708 - Jeremy Edberg: How Netflix Delivers Software
 
Presentacion de solucion cloud de navegacion segura
Presentacion de solucion cloud de navegacion seguraPresentacion de solucion cloud de navegacion segura
Presentacion de solucion cloud de navegacion segura
 
Hybrid cloud monitoring - Mumbai seminar
Hybrid cloud monitoring - Mumbai seminarHybrid cloud monitoring - Mumbai seminar
Hybrid cloud monitoring - Mumbai seminar
 
The Art of Container Monitoring
The Art of Container MonitoringThe Art of Container Monitoring
The Art of Container Monitoring
 
DCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at NetflixDCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at Netflix
 
NextGen Endpoint Security for Dummies
NextGen Endpoint Security for DummiesNextGen Endpoint Security for Dummies
NextGen Endpoint Security for Dummies
 
Alabama CyberNow 2018: Cloud Hardening and Digital Forensics Readiness
Alabama CyberNow 2018: Cloud Hardening and Digital Forensics ReadinessAlabama CyberNow 2018: Cloud Hardening and Digital Forensics Readiness
Alabama CyberNow 2018: Cloud Hardening and Digital Forensics Readiness
 
The Joy of Proactive Security
The Joy of Proactive SecurityThe Joy of Proactive Security
The Joy of Proactive Security
 
All your logs are belong to you!
All your logs are belong to you!All your logs are belong to you!
All your logs are belong to you!
 
All Your Security Events Are Belong to ... You!
All Your Security Events Are Belong to ... You!All Your Security Events Are Belong to ... You!
All Your Security Events Are Belong to ... You!
 
366864108 azure-security
366864108 azure-security366864108 azure-security
366864108 azure-security
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
 
Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...
Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...
Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...
 
CodeMotion Amsterdam 2018 - Microservices in action at the Dutch National Police
CodeMotion Amsterdam 2018 - Microservices in action at the Dutch National PoliceCodeMotion Amsterdam 2018 - Microservices in action at the Dutch National Police
CodeMotion Amsterdam 2018 - Microservices in action at the Dutch National Police
 
Azure 101: Shared responsibility in the Azure Cloud
Azure 101: Shared responsibility in the Azure CloudAzure 101: Shared responsibility in the Azure Cloud
Azure 101: Shared responsibility in the Azure Cloud
 
Regulated Reactive - Security Considerations for Building Reactive Systems in...
Regulated Reactive - Security Considerations for Building Reactive Systems in...Regulated Reactive - Security Considerations for Building Reactive Systems in...
Regulated Reactive - Security Considerations for Building Reactive Systems in...
 
Microsoft Azure Security Overview
Microsoft Azure Security OverviewMicrosoft Azure Security Overview
Microsoft Azure Security Overview
 

Más de Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkFlink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 

Más de Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 

Último

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Último (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

Anomaly Detection Engine for Cloud Activities using Flink

  • 1. Anomaly Detection Engine for Cloud Activities using Flink Flink Forward Berlin 2018 Yonatan Most & Avihai Berkovitz Microsoft Cloud App Security
  • 2. Microsoft Cloud App Security Discover and assess risks Control access in real time Detect threats Protect your information Identify cloud apps on your network, gain visibility into shadow IT, and get risk assessments and ongoing analytics. Manage and limit cloud app access based on conditions and session context, including user identity, device, and location. Identify high-risk usage and detect unusual behavior using Microsoft threat intelligence and research. Get granular control over data and use built-in or custom policies for data sharing and data loss prevention. Threat detection: Microsoft Intelligent Security Graph, Office ATP Information Protection: Office 365 & Azure Information Protection Identity: Azure AD and Conditional Access To your cloud appsExtend Microsoft security + more
  • 4. • The primary data type in the product • 16,000+ supported SaaS applications • Dozens of activity types (logon, upload, export, create user…) • Locations, devices, target objects, impersonated users… • Out of order by up to 24 hours! Cloud activities
  • 6. • Goal: detect anomalous user behavior and alert the admin • The core flow is: • Analyze all the activities and extract security-oriented insights • Maintain a behavioral model for every user, and update it inline • Detect outliers, suspicious behavior or potentially malicious activity • Cross-reference with previous alerts and other users, to prevent false positives and “alert fatigue” • Reach a decision and raise an alert within seconds Activity-based threat protection
  • 7. • Scalability • Fault tolerance & recovery • Real time processing • Short time from ingestion to detection • Extendibility Detection engine requirements • Support for massive state size • State persistency • State consistency • Fast, parallel access and update of state
  • 9. • We needed a stateful stream processing framework • We really didn’t want to build one in-house • We tested 10 frameworks • Azure ML, Azure Stream Analytics, Microsoft Orleans, Apache Storm, Apache Samza, Apache Spark streaming, Apache Ignite, Apache Beam… • We chose Flink! We embarked on a search for a framework
  • 10. Anubis • Our anomaly detection engine • A single Flink job running the entire flow • Ingests activities • Outputs alerts
  • 11. • 8 clusters across multiple datacenters • Our largest cluster: • 40 machines • 25,000 events per second • 1.3 TB of state Anubis
  • 13. Anubis scalability Scalability in… Requires… Event rate per cluster Increase cluster as needed (stability cost) Event rate per user group Key by user where possible Special treatment of group-keyed operators Event rate per user Capping the user event rate Features models and detections Key by feature / model / detection + increase cluster as needed Number of clusters Easy cluster management
  • 14. • Multiple clusters in multiple datacenters • Custom standalone cluster setup • Automation scripts • Machine setup • Flink version upgrade • Job deployment and upgrade • Kubernetes-based deployment solution is in progress Flink @ Microsoft - cluster
  • 15. • We use Azure EventHubs as sources and sinks • We use Azure Blob Storage for state backend Flink @ Microsoft – connectors
  • 16. • The job and its state should last forever • We also deploy new versions frequently and update the state objects • We created a custom versioned framework based on Kryo Flink @ Microsoft – upgrades
  • 17. • Custom wrappers for operators, state access, serializers, collectors • Adding log metadata about current operator, context, timing • Adding metrics using the built-in framework • Around 15,000 metrics per process! • Everything flows to Splunk • Investigation • Dashboards • Alerts Flink @ Microsoft – monitoring
  • 18. Flink @ Microsoft – monitoring
  • 19. Flink @ Microsoft – monitoring
  • 20. Flink @ Microsoft – monitoring
  • 21. Flink @ Microsoft – monitoring
  • 22. Flink @ Microsoft – monitoring
  • 23. Flink @ Microsoft – monitoring
  • 24. • Two other Flink jobs already in production • Another two in development • Kubernetes-based deployment solution is in progress • Helping other groups within Microsoft build Flink jobs What’s next?
  • 25. © Copyright Microsoft Corporation. All rights reserved. Thank you. Questions?