SlideShare una empresa de Scribd logo
1 de 32
Multiple Queries with Conditional Attributes
(QCATs) for Anomaly Detection in Visualization
Simon Walton, Eamonn Maguire, Min Chen
Motivation and Theory
Motivation
• Anomaly detection is often hard, and context sensitive
• We usually don’t have enough annotated training data,
and annotation itself is uncertain
• Many different techniques exist
• The human ideally should be in the loop
• The visual analytics loop!
Aims
• To develop an anomaly detection method that
• Is context-sensitive
• Does not rely on supervised learning
• Can be expanded and refined easily by the
user when needed
• Is not cost-prohibitive to run, and is linearly
scalable
Information Theory
Information is Additive
• Notion: the number of all possible answers is
the amount of information
• Roll a fair dice: 6 outcomes, equiprobable
• What if I roll it n times?
• We can make information additive:
But… few things are equiprobable!
• Most die are biased
• Most coins, too
Let’s Play a Game
• 1/3 chance of getting the ball
• What is the amount of information then in
the answer?
Defining the Total Information
• Average of all outcomes - i.e. weight according to their
probabilities:
• More generally,
Our method: QCATs
0.3
0.99 0.97
0.82
0.33
0.1
Work 1 Viagra Single Girls Job
Opportunity
Work 2 From Mum
Bayesian
DNS
checksum
Blacklists
QCATs: Query with Conditional Attributes
• Dataset A = {a1,a2,...,an} with n attributes
1 2 3 4 5 6 7 8 9 106
Conditional
2
VON VON
4
QCATs: Executing a QCAT
month
machine
user
Take QCAT Bind conditionals
month
machine
user
11=
SELECT uniq(machine, user)
WHERE month = 11
Pseudo-SQL
Instantiated QCATQCAT Specification
Combining QCATs
xth percentile
A Visual Analytical Workflow for QCATs
Goals
• An effective UI for designing QCATs
• The visual analytics loop (right) is
ideal for this
• Primarily this system would be used
by the model designer
• A modified version for the analyst,
with additional tool support
• A simplified visualisation (e.g.
time-series) for the observers
Visualisation
Knowledge
Models
(QCATs)
Data
CMU-CERT Dataset
• http://www.cert.org/insider-
threat/tools/index.cfm
• Contains known ground-truth for
insider threat scenarios
• Each event linked to a user
Email: 20m
time, user, machine, to (inc. CC,
BCC), from, size, number of
attachments, content
Web: 3.5m
time, user, machine, url
Device: 1.24m
time, user, machine id,
[insert/remove]
Logon/off: 2.6m
time, user, machine, [logon/logoff]
QCAT Workflow
• Let’s add a QCAT!
Multiple QCATs
• Real power comes from multiple QCATs
• To compare performance
• To combine results for analysis
• So let’s add another!
Implementation: Scalability
• Linear with number of columns• Linear with number of rows
Discussion and Future Work
• Future work
• Understanding how mutual information can be represented
• Choice of information-theoretic measure still an issue
• Binning strategies and assisted bin design
But wait! There’s more!
Questions?
• CC-licensed photo acknowledgements:
• Coins - https://www.flickr.com/photos/wwarby
• Cups and Balls - https://www.flickr.com/photos/eschipul
• Dice - https://www.flickr.com/photos/darwinbell

Más contenido relacionado

Destacado (9)

Prezi per impediti!
Prezi per impediti!Prezi per impediti!
Prezi per impediti!
 
Reporte de practica sumador binario
Reporte de practica sumador binarioReporte de practica sumador binario
Reporte de practica sumador binario
 
Caracteristicas de a.o Diego Ramirez
Caracteristicas de a.o Diego RamirezCaracteristicas de a.o Diego Ramirez
Caracteristicas de a.o Diego Ramirez
 
Taklimat pembestarian sept2014
Taklimat pembestarian sept2014Taklimat pembestarian sept2014
Taklimat pembestarian sept2014
 
Trevena H T Wilson
Trevena H T  WilsonTrevena H T  Wilson
Trevena H T Wilson
 
Reporte de practica transistores bjt diego ramirez
Reporte de practica transistores bjt diego ramirezReporte de practica transistores bjt diego ramirez
Reporte de practica transistores bjt diego ramirez
 
LindsayBest_Proposal (1)
LindsayBest_Proposal (1)LindsayBest_Proposal (1)
LindsayBest_Proposal (1)
 
20150212 啟華衛教簡介 丁丁衛教
20150212 啟華衛教簡介 丁丁衛教20150212 啟華衛教簡介 丁丁衛教
20150212 啟華衛教簡介 丁丁衛教
 
2رزومه اصلاح شده
2رزومه اصلاح شده 2رزومه اصلاح شده
2رزومه اصلاح شده
 

Similar a Vissec2014

SplunkLive! Prelert Session - Extending Splunk with Machine Learning
SplunkLive! Prelert Session - Extending Splunk with Machine LearningSplunkLive! Prelert Session - Extending Splunk with Machine Learning
SplunkLive! Prelert Session - Extending Splunk with Machine Learning
Splunk
 
Making & Breaking Machine Learning Anomaly Detectors in Real Life by Clarence...
Making & Breaking Machine Learning Anomaly Detectors in Real Life by Clarence...Making & Breaking Machine Learning Anomaly Detectors in Real Life by Clarence...
Making & Breaking Machine Learning Anomaly Detectors in Real Life by Clarence...
CODE BLUE
 
BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6
Rod Soto
 

Similar a Vissec2014 (20)

SplunkLive! Prelert Session - Extending Splunk with Machine Learning
SplunkLive! Prelert Session - Extending Splunk with Machine LearningSplunkLive! Prelert Session - Extending Splunk with Machine Learning
SplunkLive! Prelert Session - Extending Splunk with Machine Learning
 
Rise of the machines -- Owasp israel -- June 2014 meetup
Rise of the machines -- Owasp israel -- June 2014 meetupRise of the machines -- Owasp israel -- June 2014 meetup
Rise of the machines -- Owasp israel -- June 2014 meetup
 
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
 
Making & Breaking Machine Learning Anomaly Detectors in Real Life by Clarence...
Making & Breaking Machine Learning Anomaly Detectors in Real Life by Clarence...Making & Breaking Machine Learning Anomaly Detectors in Real Life by Clarence...
Making & Breaking Machine Learning Anomaly Detectors in Real Life by Clarence...
 
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
 
Monte Carlo and Schedule Risk Analysis
Monte Carlo and Schedule Risk AnalysisMonte Carlo and Schedule Risk Analysis
Monte Carlo and Schedule Risk Analysis
 
Open Anti-Cheat System (OACS)
Open Anti-Cheat System (OACS)Open Anti-Cheat System (OACS)
Open Anti-Cheat System (OACS)
 
Monte Carlo Schedule Risk Analysis
Monte Carlo Schedule Risk AnalysisMonte Carlo Schedule Risk Analysis
Monte Carlo Schedule Risk Analysis
 
Mathworks CAE simulation suite – case in point from automotive and aerospace.
Mathworks CAE simulation suite – case in point from automotive and aerospace.Mathworks CAE simulation suite – case in point from automotive and aerospace.
Mathworks CAE simulation suite – case in point from automotive and aerospace.
 
Machine Learning in the Real World
Machine Learning in the Real WorldMachine Learning in the Real World
Machine Learning in the Real World
 
The Golden Rules - Detecting more with RSA Security Analytics
The Golden Rules  - Detecting more with RSA Security AnalyticsThe Golden Rules  - Detecting more with RSA Security Analytics
The Golden Rules - Detecting more with RSA Security Analytics
 
Sql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.pptSql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.ppt
 
Monitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In AzureMonitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In Azure
 
Valdas Maksimavičius - Reducing Technology Risks through Prototyping
Valdas Maksimavičius - Reducing Technology Risks through PrototypingValdas Maksimavičius - Reducing Technology Risks through Prototyping
Valdas Maksimavičius - Reducing Technology Risks through Prototyping
 
Machine Learning + Analytics in Splunk
Machine Learning + Analytics in Splunk Machine Learning + Analytics in Splunk
Machine Learning + Analytics in Splunk
 
Becoming a better pen tester overview
Becoming a better pen tester overviewBecoming a better pen tester overview
Becoming a better pen tester overview
 
Do you have an "analytics"? How analytics tools work
Do you have an "analytics"? How analytics tools workDo you have an "analytics"? How analytics tools work
Do you have an "analytics"? How analytics tools work
 
Hci and psychology
Hci and psychologyHci and psychology
Hci and psychology
 
The Automation Firehose: Be Strategic & Tactical With Your Mobile & Web Testing
The Automation Firehose: Be Strategic & Tactical With Your Mobile & Web TestingThe Automation Firehose: Be Strategic & Tactical With Your Mobile & Web Testing
The Automation Firehose: Be Strategic & Tactical With Your Mobile & Web Testing
 
BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6
 

Más de Simon Walton

Más de Simon Walton (7)

Swamp 2019: She Promoted Her Helm Chart: You Won't Believe What Happened Next!
Swamp 2019: She Promoted Her Helm Chart: You Won't Believe What Happened Next!Swamp 2019: She Promoted Her Helm Chart: You Won't Believe What Happened Next!
Swamp 2019: She Promoted Her Helm Chart: You Won't Believe What Happened Next!
 
What is currently being done in the academy?
What is currently being done in the academy?What is currently being done in the academy?
What is currently being done in the academy?
 
Motion-Moderated Transfer Function for Volume Rendering 4D CMR Data
Motion-Moderated Transfer Function for Volume Rendering 4D CMR DataMotion-Moderated Transfer Function for Volume Rendering 4D CMR Data
Motion-Moderated Transfer Function for Volume Rendering 4D CMR Data
 
Glyphs in Medical Visualisations
Glyphs in Medical VisualisationsGlyphs in Medical Visualisations
Glyphs in Medical Visualisations
 
Geospatial Visualisation
Geospatial VisualisationGeospatial Visualisation
Geospatial Visualisation
 
Video Visualisation tutorial
Video Visualisation tutorialVideo Visualisation tutorial
Video Visualisation tutorial
 
CyberUK 2016
CyberUK 2016CyberUK 2016
CyberUK 2016
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Último (20)

Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

Vissec2014

  • 1. Multiple Queries with Conditional Attributes (QCATs) for Anomaly Detection in Visualization Simon Walton, Eamonn Maguire, Min Chen
  • 3. Motivation • Anomaly detection is often hard, and context sensitive • We usually don’t have enough annotated training data, and annotation itself is uncertain • Many different techniques exist • The human ideally should be in the loop • The visual analytics loop!
  • 4. Aims • To develop an anomaly detection method that • Is context-sensitive • Does not rely on supervised learning • Can be expanded and refined easily by the user when needed • Is not cost-prohibitive to run, and is linearly scalable
  • 6. Information is Additive • Notion: the number of all possible answers is the amount of information • Roll a fair dice: 6 outcomes, equiprobable • What if I roll it n times? • We can make information additive:
  • 7. But… few things are equiprobable! • Most die are biased • Most coins, too
  • 8. Let’s Play a Game • 1/3 chance of getting the ball • What is the amount of information then in the answer?
  • 9. Defining the Total Information • Average of all outcomes - i.e. weight according to their probabilities: • More generally,
  • 11. 0.3 0.99 0.97 0.82 0.33 0.1 Work 1 Viagra Single Girls Job Opportunity Work 2 From Mum Bayesian DNS checksum Blacklists
  • 12. QCATs: Query with Conditional Attributes • Dataset A = {a1,a2,...,an} with n attributes 1 2 3 4 5 6 7 8 9 106 Conditional 2 VON VON 4
  • 13. QCATs: Executing a QCAT month machine user Take QCAT Bind conditionals month machine user 11= SELECT uniq(machine, user) WHERE month = 11 Pseudo-SQL Instantiated QCATQCAT Specification
  • 15. A Visual Analytical Workflow for QCATs
  • 16. Goals • An effective UI for designing QCATs • The visual analytics loop (right) is ideal for this • Primarily this system would be used by the model designer • A modified version for the analyst, with additional tool support • A simplified visualisation (e.g. time-series) for the observers Visualisation Knowledge Models (QCATs) Data
  • 17. CMU-CERT Dataset • http://www.cert.org/insider- threat/tools/index.cfm • Contains known ground-truth for insider threat scenarios • Each event linked to a user Email: 20m time, user, machine, to (inc. CC, BCC), from, size, number of attachments, content Web: 3.5m time, user, machine, url Device: 1.24m time, user, machine id, [insert/remove] Logon/off: 2.6m time, user, machine, [logon/logoff]
  • 18.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. Multiple QCATs • Real power comes from multiple QCATs • To compare performance • To combine results for analysis • So let’s add another!
  • 25.
  • 26.
  • 27.
  • 28.
  • 29. Implementation: Scalability • Linear with number of columns• Linear with number of rows
  • 30. Discussion and Future Work • Future work • Understanding how mutual information can be represented • Choice of information-theoretic measure still an issue • Binning strategies and assisted bin design
  • 32. Questions? • CC-licensed photo acknowledgements: • Coins - https://www.flickr.com/photos/wwarby • Cups and Balls - https://www.flickr.com/photos/eschipul • Dice - https://www.flickr.com/photos/darwinbell

Notas del editor

  1. {\color[rgb]{1.000000,1.000000,1.000000}log_2(\frac{1}{\frac{1}{3}}) = log_2(\frac{3}{1}) = log_2(3) \approx 1.58} {\color[rgb]{1.000000,1.000000,1.000000}log_2(\frac{3}{2}) \approx 0.58}
  2. {\color[rgb]{1.000000,1.000000,1.000000} \frac{1}{3}log_2(3) + \frac{2}{3}log_2(\frac{3}{2}) =\approx 0.92 } {\color[rgb]{1.000000,1.000000,1.000000} \sum^r_{i=1}{p_i \cdot log_2(\frac{1}{p_i})} }
  3. 10 attributes
  4. Each unique combination of variants in result set is considered a letter in the alphabet