SlideShare una empresa de Scribd logo
1 de 35
Descargar para leer sin conexión
Applying	
  Machine	
  Learning	
  to	
  Network	
  
Security	
  Monitoring
	
  
Alexandre	
  Pinto
	
  
Chief	
  Data	
  Scien4st	
  |	
  MLSec	
  Project	
  
	
  

	
  

@alexcpsec
@MLSecProject!
WARNING!
	
  
•  This	
  is	
  a	
  talk	
  about	
  BUILDING	
  not	
  breaking	
  
–  NO	
  systems	
  were	
  harmed	
  on	
  the	
  development	
  of	
  this	
  talk.	
  
–  This	
  is	
  NOT	
  about	
  1337	
  Android	
  Malware	
  
•  Only	
  thing	
  we	
  are	
  likely	
  to	
  break	
  here	
  is	
  the	
  4me	
  limit	
  on	
  the	
  
talk	
  
	
  
•  This	
  talk	
  includes	
  more	
  MATH	
  than	
  the	
  daily	
  recommended	
  
intake	
  by	
  the	
  FDA.	
  
•  All	
  stunts	
  described	
  in	
  this	
  talk	
  were	
  performed	
  by	
  trained	
  
professionals.!
Who's	
  Alex?
	
  
•  13	
  years	
  in	
  Informa4on	
  Security,	
  done	
  a	
  liRle	
  bit	
  of	
  everything.	
  
•  Past	
  7	
  or	
  so	
  years	
  leading	
  security	
  consultancy	
  and	
  monitoring	
  
teams	
  in	
  Brazil,	
  London	
  and	
  the	
  US.	
  
–  If	
  there	
  is	
  any	
  way	
  a	
  SIEM	
  can	
  hurt	
  you,	
  it	
  did	
  to	
  me.	
  

•  Researching	
  machine	
  learning	
  and	
  data	
  science	
  in	
  general	
  for	
  
the	
  past	
  year	
  or	
  so	
  and	
  presen4ng	
  about	
  the	
  intersec4on	
  of	
  it	
  
and	
  Infosec	
  throughout	
  the	
  year.	
  
•  Created	
  MLSec	
  Project	
  in	
  July	
  2013	
  to	
  give	
  structure	
  to	
  the	
  
research	
  being	
  done.	
  
Agenda
	
  
•  Defini4ons	
  
•  Big	
  Data	
  
•  Data	
  Science	
  
•  Machine	
  Learning	
  

• 
• 
• 
• 
• 

Y	
  U	
  DO	
  DIS?	
  
Network	
  Security	
  Monitoring	
  
PoC	
  ||	
  GTFO	
  
Feature	
  Intui4on	
  
How	
  to	
  get	
  started?	
  
 

Big	
  Data	
  +	
  Machine	
  Learning	
  +	
  Data	
  Science
 

Big	
  Data	
  +	
  Machine	
  Learning	
  +	
  Data	
  Science
Big	
  Data
	
  
(Security)	
  Data	
  ScienEst
	
  

•  “Data	
  Scien4st	
  (n.):	
  Person	
  who	
  is	
  beRer	
  at	
  sta4s4cs	
  than	
  any	
  so`ware	
  
engineer	
  and	
  beRer	
  at	
  so`ware	
  engineering	
  than	
  any	
  sta4s4cian.”	
	
  -­‐-­‐	
  Josh	
  Willis,	
  Cloudera	
  

Data	
  Science	
  Venn	
  Diagram	
  by	
  Drew	
  Conway!
Enter	
  Machine	
  Learning
	
  
•  “Machine	
  learning	
  systems	
  automa4cally	
  learn	
  programs	
  
from	
  data”	
  (*)	
  
•  You	
  don’t	
  really	
  code	
  the	
  program,	
  but	
  it	
  is	
  inferred	
  
from	
  data.	
  
•  Intui4on	
  of	
  trying	
  to	
  mimic	
  the	
  way	
  the	
  brain	
  learns:	
  	
  
that's	
  where	
  terms	
  like	
  ar#ficial	
  intelligence	
  come	
  from.
!

(*)	
  CACM	
  55(10)	
  -­‐	
  A	
  Few	
  Useful	
  Things	
  to	
  Know	
  about	
  Machine	
  Learning	
  (Domingos	
  2012)	
  
Kinds	
  of	
  Machine	
  Learning
	
  
•  Supervised	
  Learning:	
  

–  Classifica4on	
  (NN,	
  SVM,	
  Naïve	
  
Bayes)	
  
–  Regression	
  (linear,	
  logis4c)!

•  Unsupervised	
  Learning	
  :	
  
–  Clustering	
  (k-­‐means)	
  
–  Decomposi4on	
  (PCA,	
  SVD)	
  

Source	
  –	
  scikit-­‐learn.github.io/scikit-­‐learn-­‐tutorial/general_concepts.html	
  
ClassificaEon	
  Example
	
  

VS!
Regression	
  Example
	
  
ConsideraEons	
  on	
  Data	
  Gathering
	
  
•  Models	
  will	
  (generally)	
  get	
  beRer	
  with	
  more	
  data	
  
–  But	
  we	
  always	
  have	
  to	
  consider	
  bias	
  and	
  variance	
  as	
  we	
  select	
  our	
  data	
  
points	
  
–  Also	
  adversaries	
  –	
  we	
  may	
  be	
  force	
  fed	
  “bad	
  data”,	
  find	
  signal	
  in	
  weird	
  
noise	
  or	
  design	
  bad	
  (or	
  exploitable)	
  features	
  

•  “I’ve	
  got	
  99	
  problems,	
  but	
  data	
  ain’t	
  one”!

Domingos,	
  2012	
  

Abu-­‐Mostafa,	
  Caltech,	
  2012	
  
ApplicaEons	
  of	
  Machine	
  Learning
	
  
•  Sales

!

•  Trading	
  

•  Image	
  and	
  
Voice	
  
Recogni4on	
  
Y	
  U	
  DO	
  DIS?
	
  
•  Common	
  reac4ons	
  from	
  Security	
  Professionals:	
  
•  “Eh,	
  cool…”	
  *blank	
  stare*	
  *walks	
  away*	
  
•  “Are	
  you	
  high,	
  bro?”	
  
•  “Why	
  aren’t	
  you	
  doing	
  some	
  cool	
  research	
  like	
  Android	
  
Malware?”	
  
Math	
  is	
  HARD
	
  
Security	
  ApplicaEons	
  of	
  ML
	
  
•  Fraud	
  detec4on	
  systems:	
  
–  Is	
  what	
  he	
  just	
  did	
  consistent	
  with	
  past	
  
behavior?	
  

•  Network	
  anomaly	
  detec4on	
  (?):	
  
–  More	
  like	
  bad	
  sta4s4cal	
  analysis	
  
–  Did	
  not	
  advance	
  a	
  lot,	
  IMO	
  

•  Predic4ng	
  likelihood	
  of	
  aRack	
  actors	
  
–  Create	
  different	
  predic4ve	
  models	
  and	
  
chain	
  them	
  to	
  gain	
  more	
  confidence	
  in	
  each	
  
step.!

•  SPAM	
  filters	
  
ConsideraEons	
  on	
  Data	
  Gathering
	
  
•  Adversaries	
  -­‐	
  Exploi4ng	
  the	
  learning	
  process	
  
•  Understand	
  the	
  model,	
  understand	
  the	
  machine,	
  and	
  
you	
  can	
  circumvent	
  it	
  
•  Something	
  InfoSec	
  community	
  knows	
  very	
  well	
  
•  Any	
  predic4ve	
  model	
  on	
  InfoSec	
  will	
  be	
  pushed	
  to	
  the	
  
limit	
  
•  Again,	
  think	
  back	
  on	
  the	
  	
  
way	
  SPAM	
  engines	
  evolved.!
Network	
  Security	
  Monitoring
	
  
CorrelaEon	
  Rules:	
  A	
  Primer
	
  
•  Rules	
  in	
  a	
  SIEM	
  solu4on	
  invariably	
  are:	
  

–  “Something”	
  has	
  happened	
  “x”	
  4mes;	
  
–  “Something”	
  has	
  happened	
  and	
  other	
  “something2”	
  has	
  
happened,	
  with	
  some	
  rela4onship	
  (4me,	
  same	
  fields,	
  etc)	
  
between	
  them.	
  

•  Configuring	
  SIEM	
  =	
  iterate	
  on	
  combina4ons	
  un4l:	
  
–  Customer	
  or	
  management	
  is	
  foole..	
  I	
  mean	
  sa4sfied;	
  	
  
–  Consul4ng	
  money	
  runs	
  out	
  

•  Behavioral	
  rules	
  (anomaly	
  detec4on)	
  helps	
  a	
  bit	
  with	
  
the	
  “x”s,	
  but	
  s4ll,	
  very	
  laborious	
  and	
  4me	
  
consuming.!
Kinds	
  of	
  Network	
  Security	
  Monitoring
	
  
•  Alert-­‐based:	
  

–  “Tradi4onal”	
  log	
  management	
  
–  SIEM	
  
–  Using	
  “Threat	
  Intelligence”	
  (i.e	
  
blacklists)	
  for	
  about	
  a	
  year	
  or	
  
so	
  
–  Lack	
  of	
  context	
  
–  Low	
  effec4veness	
  
–  You	
  get	
  the	
  results	
  handed	
  
over	
  to	
  you	
  

•  Explora4on-­‐based:	
  
–  Network	
  Forensics	
  tools	
  (2/3	
  
years	
  ago)	
  
–  Elas4c	
  Search	
  based	
  LM	
  
systems	
  
–  High	
  effec4veness	
  
–  Lots	
  of	
  people	
  necessary	
  
–  Lots	
  of	
  HIGHLY	
  trained	
  people	
  

•  Big	
  Data	
  Security	
  Analy4cs	
  (BDSA):	
  

–  Run	
  explora4on-­‐based	
  monitoring	
  on	
  Hadoop	
  
–  More	
  like	
  Big	
  Data	
  Security	
  Monitoring	
  (BDSM)	
  
Alert-­‐based	
  +	
  ExploraEon-­‐based
	
  
A	
  wild	
  army	
  of	
  robots	
  appears
	
  
Using	
  robots	
  to	
  catch	
  bad	
  guys
	
  
PoC	
  ||	
  GTFO
	
  
•  We	
  developed	
  a	
  set	
  of	
  algorithms	
  to	
  detect	
  malicious	
  
behavior	
  from	
  log	
  entries	
  of	
  firewall	
  blocks	
  
•  Over	
  6	
  months	
  of	
  data	
  from	
  SANS	
  DShield	
  (thanks,	
  guys!)	
  
	
  
•  A`er	
  a	
  lot	
  of	
  sta4s4cal-­‐based	
  math	
  (true	
  posi4ve	
  ra4o,	
  
true	
  nega4ve	
  ra4o,	
  odds	
  likelihood),	
  it	
  could	
  pinpoint	
  
actors	
  that	
  would	
  be	
  13x-­‐18x	
  more	
  likely	
  to	
  aRack	
  you.	
  
•  Today	
  more	
  like	
  30x	
  on	
  the	
  SANS	
  data,	
  and	
  finding	
  
around	
  80%	
  of	
  “badness”	
  in	
  par4cipant	
  deployments.!
Feature	
  IntuiEon:	
  IP	
  Proximity
	
  
•  Assump4ons	
  to	
  aggregate	
  the	
  data	
  	
  
•  Correla4on	
  /	
  proximity	
  /	
  similarity	
  BY	
  BEHAVIOR	
  
•  “Bad	
  Neighborhoods”	
  concept:	
  	
  
–  Spamhaus	
  x	
  CyberBunker	
  
–  Google	
  Report	
  (June	
  2013)	
  
–  Moura	
  2013	
  

•  Group	
  by	
  Geoloca4on	
  
•  Group	
  by	
  Netblock	
  (/16,	
  /24)	
  
•  Group	
  by	
  ASN	
  	
  
–  (thanks,	
  Team	
  Cymru)!
0	
  

10	
  

MULTICAST	
  AND	
  FRIENDS	
  

You	
  are	
  
here!

CN,	
  
BR,	
  
TH	
  

Map	
  of	
  the	
  
Internet	
  
•  (Hilbert	
  Curve)	
  
•  Block	
  port	
  22	
  	
  
•  2013-­‐07-­‐20	
  

CN	
  
127	
  

RU	
  
Feature	
  IntuiEon:	
  Temporal	
  Decay
	
  
•  Even	
  bad	
  neighborhoods	
  renovate:	
  
–  ARackers	
  may	
  change	
  ISPs/proxies	
  
–  Botnets	
  may	
  be	
  shut	
  down	
  /	
  relocate	
  
–  A	
  liRle	
  paranoia	
  is	
  Ok,	
  but	
  not	
  EVERYONE	
  is	
  out	
  to	
  get	
  you	
  (at	
  least	
  
not	
  all	
  at	
  once)!

•  As	
  days	
  pass,	
  let's	
  forget,	
  bit	
  by	
  bit,	
  
who	
  aRacked	
  
•  Last	
  4me	
  I	
  saw	
  this	
  actor,	
  and	
  how	
  
o`en	
  did	
  I	
  see	
  them!
MLSec	
  Project	
  
•  Behavior:	
  block	
  
on	
  port	
  22	
  
•  Trial	
  inference	
  on	
  
100k	
  IP	
  addresses	
  
per	
  Class	
  A	
  
subnet	
  
•  Logarithm	
  	
  scale:	
  
brightest	
  4les	
  are	
  
10	
  to	
  1000	
  4mes	
  
more	
  likely	
  to	
  
aRack.	
  
Feature	
  IntuiEon:	
  DNS	
  features
	
  
•  Who	
  resolves	
  to	
  this	
  IP	
  address?	
  
•  Number	
  of	
  domains	
  that	
  resolve	
  to	
  the	
  IP	
  address	
  
•  Distribu4on	
  of	
  their	
  life4me	
  
•  Entropy,	
  size,	
  ccTLDs	
  
•  Registrar	
  informa4on	
  
•  Reverse	
  DNS	
  informa4on…	
  
•  History	
  of	
  DNS	
  registra4on…	
  
•  (Thanks,	
  DNSDB!)	
  
Training	
  the	
  Model
	
  
•  YAY!	
  We	
  have	
  a	
  bunch	
  of	
  numbers	
  per	
  IP	
  address/domain!	
  
•  How	
  do	
  you	
  define	
  what	
  is	
  malicious	
  or	
  not?	
  
•  “Advanced	
  exper4se	
  in	
  both	
  informa4on	
  security	
  and	
  data	
  
science	
  will	
  be	
  a	
  necessary	
  ingredient	
  in	
  enabling	
  accurate	
  
discrimina4on	
  between	
  malicious	
  and	
  benign	
  ac4vity.	
  “	
  
	
  
	
  
	
  
	
  -­‐	
  Anton	
  Chuvakin,	
  Gartner	
  
•  Kinda	
  easy	
  for	
  security	
  tools	
  (if	
  you	
  trust	
  them)	
  
•  Web	
  applica4on	
  logs	
  need	
  deeper	
  sta4s4cal	
  analysis	
  
•  Not	
  normal	
  /	
  standard	
  devia4on	
  thing	
  
	
  
!
How	
  do	
  I	
  get	
  started	
  on	
  this?
	
  
•  Programming	
  is	
  a	
  must	
  (Python	
  /	
  R)	
  
•  Sta4s4cal	
  knowledge	
  keeps	
  you	
  from	
  making	
  dumb	
  
mistakes	
  
•  Specific	
  machine	
  learning	
  courses	
  and	
  books:	
  
–  Coursera	
  (ML/	
  Data	
  Analysis	
  /	
  Data	
  Science)	
  

•  Prac4ce,	
  Prac4ce,	
  Prac4ce:	
  
–  Explore	
  your	
  data!	
  –	
  (Security	
  Onion)	
  
–  Kaggle	
  
–  KDD,	
  VAST,	
  VizSec!
MLSec	
  Project
	
  
•  Sign	
  up,	
  send	
  logs,	
  receive	
  reports	
  generated	
  by	
  machine	
  
learning	
  models!	
  
•  Working	
  with	
  several	
  companies	
  on	
  trying	
  out	
  these	
  models	
  on	
  
their	
  environment	
  with	
  their	
  data	
  
•  We	
  are	
  hiring	
  (KINDA)	
  
•  Visit	
  h]ps://www.mlsecproject.org	
  ,	
  message	
  @MLSecProject	
  
or	
  just	
  e-­‐mail	
  me.!
MLSec	
  Project	
  -­‐	
  Current	
  Research
	
  
•  Inbound	
  aRacks	
  on	
  exposed	
  services	
  (DEFCON/BH	
  2013):	
  
–  Informa4on	
  from	
  inbound	
  connec4ons	
  on	
  firewalls,	
  IPS,	
  WAFs	
  
–  Feature	
  extrac4on	
  and	
  supervised	
  learning	
  
	
  	
  
•  Malware	
  Distribu4on	
  and	
  Botnets:	
  
–  Informa4on	
  from	
  outbound	
  connec4ons	
  on	
  firewalls,	
  DNS	
  and	
  
Web	
  Proxy	
  
–  Ini4al	
  labeling	
  provided	
  by	
  intelligence	
  feeds	
  and	
  AV/an4-­‐malware	
  
–  Semi-­‐supervised	
  learning	
  involved	
  
•  Kill-­‐chain	
  Ensemble	
  Models:	
  
–  Increased	
  precision	
  by	
  composing	
  different	
  behaviors	
  
–  Web	
  server	
  path	
  -­‐>	
  go	
  through	
  Firewall,	
  then	
  IPS,	
  then	
  WAF	
  
–  Early	
  confirma4on	
  of	
  aRack	
  failure	
  or	
  success	
  
Thanks!
	
  
•  Q&A?	
  
•  Feedback?	
  

Alexandre	
  Pinto	
  
	
  

@alexcpsec
	
  
@MLSecProject
	
  
hRps://www.mlsecproject.org/
	
  

"	
  Essen4ally,	
  all	
  models	
  are	
  wrong,	
  but	
  some	
  are	
  useful."	
  	
  
	
  
	
  
	
  
	
  
	
  	
  	
  
	
  	
  	
  -­‐	
  George	
  E.	
  P.	
  Box	
  	
  

Más contenido relacionado

La actualidad más candente

From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tid...
From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tid...From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tid...
From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tid...Alex Pinto
 
Towards a Threat Hunting Automation Maturity Model
Towards a Threat Hunting Automation Maturity ModelTowards a Threat Hunting Automation Maturity Model
Towards a Threat Hunting Automation Maturity ModelAlex Pinto
 
Beyond Matching: Applying Data Science Techniques to IOC-based Detection
Beyond Matching: Applying Data Science Techniques to IOC-based DetectionBeyond Matching: Applying Data Science Techniques to IOC-based Detection
Beyond Matching: Applying Data Science Techniques to IOC-based DetectionAlex Pinto
 
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...Alex Pinto
 
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption StrategiesJoshua R Nicholson
 
Billions & Billions of Logs
Billions & Billions of LogsBillions & Billions of Logs
Billions & Billions of LogsJack Crook
 
Luncheon 2016-07-16 - Topic 2 - Advanced Threat Hunting by Justin Falck
Luncheon 2016-07-16 -  Topic 2 - Advanced Threat Hunting by Justin FalckLuncheon 2016-07-16 -  Topic 2 - Advanced Threat Hunting by Justin Falck
Luncheon 2016-07-16 - Topic 2 - Advanced Threat Hunting by Justin FalckNorth Texas Chapter of the ISSA
 
Abstract Tools for Effective Threat Hunting
Abstract Tools for Effective Threat HuntingAbstract Tools for Effective Threat Hunting
Abstract Tools for Effective Threat Huntingchrissanders88
 
LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014Tom LaGatta
 
Avoiding the Pitfalls of Hunting - BSides Charm 2016
Avoiding the Pitfalls of Hunting - BSides Charm 2016Avoiding the Pitfalls of Hunting - BSides Charm 2016
Avoiding the Pitfalls of Hunting - BSides Charm 2016Tony Cook
 
IT Operation Analytic for security- MiSSconf(sp1)
IT Operation Analytic for security- MiSSconf(sp1)IT Operation Analytic for security- MiSSconf(sp1)
IT Operation Analytic for security- MiSSconf(sp1)stelligence
 
Creating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & VisualizationCreating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & VisualizationRaffael Marty
 
Machine Learning in Information Security by Mohammed Zuber
Machine Learning in Information Security by Mohammed ZuberMachine Learning in Information Security by Mohammed Zuber
Machine Learning in Information Security by Mohammed ZuberOWASP Delhi
 
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOs
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOsSPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOs
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOsRod Soto
 
Full-System Emulation Achieving Successful Automated Dynamic Analysis of Evas...
Full-System Emulation Achieving Successful Automated Dynamic Analysis of Evas...Full-System Emulation Achieving Successful Automated Dynamic Analysis of Evas...
Full-System Emulation Achieving Successful Automated Dynamic Analysis of Evas...Lastline, Inc.
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyYannick Pouliot
 
Remote forensics fsec2016 delija draft
Remote forensics fsec2016 delija draftRemote forensics fsec2016 delija draft
Remote forensics fsec2016 delija draftDamir Delija
 
H@dfex 2015 malware analysis
H@dfex 2015   malware analysisH@dfex 2015   malware analysis
H@dfex 2015 malware analysisCharles Lim
 
SplunkLive! Customer Presentation – Virtustream
SplunkLive! Customer Presentation – VirtustreamSplunkLive! Customer Presentation – Virtustream
SplunkLive! Customer Presentation – VirtustreamSplunk
 

La actualidad más candente (20)

From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tid...
From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tid...From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tid...
From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tid...
 
Towards a Threat Hunting Automation Maturity Model
Towards a Threat Hunting Automation Maturity ModelTowards a Threat Hunting Automation Maturity Model
Towards a Threat Hunting Automation Maturity Model
 
Beyond Matching: Applying Data Science Techniques to IOC-based Detection
Beyond Matching: Applying Data Science Techniques to IOC-based DetectionBeyond Matching: Applying Data Science Techniques to IOC-based Detection
Beyond Matching: Applying Data Science Techniques to IOC-based Detection
 
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
 
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies
 
Billions & Billions of Logs
Billions & Billions of LogsBillions & Billions of Logs
Billions & Billions of Logs
 
Luncheon 2016-07-16 - Topic 2 - Advanced Threat Hunting by Justin Falck
Luncheon 2016-07-16 -  Topic 2 - Advanced Threat Hunting by Justin FalckLuncheon 2016-07-16 -  Topic 2 - Advanced Threat Hunting by Justin Falck
Luncheon 2016-07-16 - Topic 2 - Advanced Threat Hunting by Justin Falck
 
Penetration Testing
Penetration TestingPenetration Testing
Penetration Testing
 
Abstract Tools for Effective Threat Hunting
Abstract Tools for Effective Threat HuntingAbstract Tools for Effective Threat Hunting
Abstract Tools for Effective Threat Hunting
 
LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014
 
Avoiding the Pitfalls of Hunting - BSides Charm 2016
Avoiding the Pitfalls of Hunting - BSides Charm 2016Avoiding the Pitfalls of Hunting - BSides Charm 2016
Avoiding the Pitfalls of Hunting - BSides Charm 2016
 
IT Operation Analytic for security- MiSSconf(sp1)
IT Operation Analytic for security- MiSSconf(sp1)IT Operation Analytic for security- MiSSconf(sp1)
IT Operation Analytic for security- MiSSconf(sp1)
 
Creating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & VisualizationCreating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & Visualization
 
Machine Learning in Information Security by Mohammed Zuber
Machine Learning in Information Security by Mohammed ZuberMachine Learning in Information Security by Mohammed Zuber
Machine Learning in Information Security by Mohammed Zuber
 
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOs
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOsSPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOs
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOs
 
Full-System Emulation Achieving Successful Automated Dynamic Analysis of Evas...
Full-System Emulation Achieving Successful Automated Dynamic Analysis of Evas...Full-System Emulation Achieving Successful Automated Dynamic Analysis of Evas...
Full-System Emulation Achieving Successful Automated Dynamic Analysis of Evas...
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
 
Remote forensics fsec2016 delija draft
Remote forensics fsec2016 delija draftRemote forensics fsec2016 delija draft
Remote forensics fsec2016 delija draft
 
H@dfex 2015 malware analysis
H@dfex 2015   malware analysisH@dfex 2015   malware analysis
H@dfex 2015 malware analysis
 
SplunkLive! Customer Presentation – Virtustream
SplunkLive! Customer Presentation – VirtustreamSplunkLive! Customer Presentation – Virtustream
SplunkLive! Customer Presentation – Virtustream
 

Similar a Applying Machine Learning to Network Security Monitoring - BayThreat 2013

Defcon 21-pinto-defending-networks-machine-learning by pseudor00t
Defcon 21-pinto-defending-networks-machine-learning by pseudor00tDefcon 21-pinto-defending-networks-machine-learning by pseudor00t
Defcon 21-pinto-defending-networks-machine-learning by pseudor00tpseudor00t overflow
 
Delivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and VisualizationDelivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and VisualizationRaffael Marty
 
Civilian OPSEC in cyberspace
Civilian OPSEC  in cyberspaceCivilian OPSEC  in cyberspace
Civilian OPSEC in cyberspacezapp0
 
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malwareDefcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malwareDaveEdwards12
 
Bar Camp 11 Oct09 Hacking
Bar Camp 11 Oct09 HackingBar Camp 11 Oct09 Hacking
Bar Camp 11 Oct09 HackingBarcamp Kerala
 
Attack Simulation and Hunting
Attack Simulation and HuntingAttack Simulation and Hunting
Attack Simulation and Huntingnathi mogomotsi
 
Intro to INFOSEC
Intro to INFOSECIntro to INFOSEC
Intro to INFOSECSean Whalen
 
Sasa milic, cisco advanced malware protection
Sasa milic, cisco advanced malware protectionSasa milic, cisco advanced malware protection
Sasa milic, cisco advanced malware protectionDejan Jeremic
 
Hacking - penetration tools
Hacking - penetration toolsHacking - penetration tools
Hacking - penetration toolsJenishChauhan4
 
Managing Next Generation Threats to Cyber Security
Managing Next Generation Threats to Cyber SecurityManaging Next Generation Threats to Cyber Security
Managing Next Generation Threats to Cyber SecurityPriyanka Aash
 
Fundamentals of Network security
Fundamentals of Network securityFundamentals of Network security
Fundamentals of Network securityAPNIC
 
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malwareDefcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malwarePriyanka Aash
 
Advanced Threats and Lateral Movement Detection
Advanced Threats and Lateral Movement DetectionAdvanced Threats and Lateral Movement Detection
Advanced Threats and Lateral Movement DetectionGreg Foss
 
Luiz eduardo. introduction to mobile snitch
Luiz eduardo. introduction to mobile snitchLuiz eduardo. introduction to mobile snitch
Luiz eduardo. introduction to mobile snitchYury Chemerkin
 
ISACA Ethical Hacking Presentation 10/2011
ISACA Ethical Hacking Presentation 10/2011ISACA Ethical Hacking Presentation 10/2011
ISACA Ethical Hacking Presentation 10/2011Xavier Mertens
 
Keith J. Jones, Ph.D. - Crash Course malware analysis
Keith J. Jones, Ph.D. - Crash Course malware analysisKeith J. Jones, Ph.D. - Crash Course malware analysis
Keith J. Jones, Ph.D. - Crash Course malware analysisKeith Jones, PhD
 
UMS Cybersecurity Awareness Seminar: Cybersecurity - Lessons learned from sec...
UMS Cybersecurity Awareness Seminar: Cybersecurity - Lessons learned from sec...UMS Cybersecurity Awareness Seminar: Cybersecurity - Lessons learned from sec...
UMS Cybersecurity Awareness Seminar: Cybersecurity - Lessons learned from sec...APNIC
 
Threat Hunting by Falgun Rathod - Cyber Octet Private Limited
Threat Hunting by Falgun Rathod - Cyber Octet Private LimitedThreat Hunting by Falgun Rathod - Cyber Octet Private Limited
Threat Hunting by Falgun Rathod - Cyber Octet Private LimitedFalgun Rathod
 

Similar a Applying Machine Learning to Network Security Monitoring - BayThreat 2013 (20)

Defcon 21-pinto-defending-networks-machine-learning by pseudor00t
Defcon 21-pinto-defending-networks-machine-learning by pseudor00tDefcon 21-pinto-defending-networks-machine-learning by pseudor00t
Defcon 21-pinto-defending-networks-machine-learning by pseudor00t
 
Delivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and VisualizationDelivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and Visualization
 
Civilian OPSEC in cyberspace
Civilian OPSEC  in cyberspaceCivilian OPSEC  in cyberspace
Civilian OPSEC in cyberspace
 
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malwareDefcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
 
Bar Camp 11 Oct09 Hacking
Bar Camp 11 Oct09 HackingBar Camp 11 Oct09 Hacking
Bar Camp 11 Oct09 Hacking
 
Attack Simulation and Hunting
Attack Simulation and HuntingAttack Simulation and Hunting
Attack Simulation and Hunting
 
Intro to INFOSEC
Intro to INFOSECIntro to INFOSEC
Intro to INFOSEC
 
Sasa milic, cisco advanced malware protection
Sasa milic, cisco advanced malware protectionSasa milic, cisco advanced malware protection
Sasa milic, cisco advanced malware protection
 
Hacking - penetration tools
Hacking - penetration toolsHacking - penetration tools
Hacking - penetration tools
 
Osint
OsintOsint
Osint
 
Managing Next Generation Threats to Cyber Security
Managing Next Generation Threats to Cyber SecurityManaging Next Generation Threats to Cyber Security
Managing Next Generation Threats to Cyber Security
 
Fundamentals of Network security
Fundamentals of Network securityFundamentals of Network security
Fundamentals of Network security
 
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malwareDefcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
 
Computer Security
Computer SecurityComputer Security
Computer Security
 
Advanced Threats and Lateral Movement Detection
Advanced Threats and Lateral Movement DetectionAdvanced Threats and Lateral Movement Detection
Advanced Threats and Lateral Movement Detection
 
Luiz eduardo. introduction to mobile snitch
Luiz eduardo. introduction to mobile snitchLuiz eduardo. introduction to mobile snitch
Luiz eduardo. introduction to mobile snitch
 
ISACA Ethical Hacking Presentation 10/2011
ISACA Ethical Hacking Presentation 10/2011ISACA Ethical Hacking Presentation 10/2011
ISACA Ethical Hacking Presentation 10/2011
 
Keith J. Jones, Ph.D. - Crash Course malware analysis
Keith J. Jones, Ph.D. - Crash Course malware analysisKeith J. Jones, Ph.D. - Crash Course malware analysis
Keith J. Jones, Ph.D. - Crash Course malware analysis
 
UMS Cybersecurity Awareness Seminar: Cybersecurity - Lessons learned from sec...
UMS Cybersecurity Awareness Seminar: Cybersecurity - Lessons learned from sec...UMS Cybersecurity Awareness Seminar: Cybersecurity - Lessons learned from sec...
UMS Cybersecurity Awareness Seminar: Cybersecurity - Lessons learned from sec...
 
Threat Hunting by Falgun Rathod - Cyber Octet Private Limited
Threat Hunting by Falgun Rathod - Cyber Octet Private LimitedThreat Hunting by Falgun Rathod - Cyber Octet Private Limited
Threat Hunting by Falgun Rathod - Cyber Octet Private Limited
 

Último

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 

Último (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 

Applying Machine Learning to Network Security Monitoring - BayThreat 2013

  • 1. Applying  Machine  Learning  to  Network   Security  Monitoring   Alexandre  Pinto   Chief  Data  Scien4st  |  MLSec  Project       @alexcpsec @MLSecProject!
  • 2. WARNING!   •  This  is  a  talk  about  BUILDING  not  breaking   –  NO  systems  were  harmed  on  the  development  of  this  talk.   –  This  is  NOT  about  1337  Android  Malware   •  Only  thing  we  are  likely  to  break  here  is  the  4me  limit  on  the   talk     •  This  talk  includes  more  MATH  than  the  daily  recommended   intake  by  the  FDA.   •  All  stunts  described  in  this  talk  were  performed  by  trained   professionals.!
  • 3. Who's  Alex?   •  13  years  in  Informa4on  Security,  done  a  liRle  bit  of  everything.   •  Past  7  or  so  years  leading  security  consultancy  and  monitoring   teams  in  Brazil,  London  and  the  US.   –  If  there  is  any  way  a  SIEM  can  hurt  you,  it  did  to  me.   •  Researching  machine  learning  and  data  science  in  general  for   the  past  year  or  so  and  presen4ng  about  the  intersec4on  of  it   and  Infosec  throughout  the  year.   •  Created  MLSec  Project  in  July  2013  to  give  structure  to  the   research  being  done.  
  • 4. Agenda   •  Defini4ons   •  Big  Data   •  Data  Science   •  Machine  Learning   •  •  •  •  •  Y  U  DO  DIS?   Network  Security  Monitoring   PoC  ||  GTFO   Feature  Intui4on   How  to  get  started?  
  • 5.   Big  Data  +  Machine  Learning  +  Data  Science
  • 6.   Big  Data  +  Machine  Learning  +  Data  Science
  • 8. (Security)  Data  ScienEst   •  “Data  Scien4st  (n.):  Person  who  is  beRer  at  sta4s4cs  than  any  so`ware   engineer  and  beRer  at  so`ware  engineering  than  any  sta4s4cian.”  -­‐-­‐  Josh  Willis,  Cloudera   Data  Science  Venn  Diagram  by  Drew  Conway!
  • 9. Enter  Machine  Learning   •  “Machine  learning  systems  automa4cally  learn  programs   from  data”  (*)   •  You  don’t  really  code  the  program,  but  it  is  inferred   from  data.   •  Intui4on  of  trying  to  mimic  the  way  the  brain  learns:     that's  where  terms  like  ar#ficial  intelligence  come  from. ! (*)  CACM  55(10)  -­‐  A  Few  Useful  Things  to  Know  about  Machine  Learning  (Domingos  2012)  
  • 10. Kinds  of  Machine  Learning   •  Supervised  Learning:   –  Classifica4on  (NN,  SVM,  Naïve   Bayes)   –  Regression  (linear,  logis4c)! •  Unsupervised  Learning  :   –  Clustering  (k-­‐means)   –  Decomposi4on  (PCA,  SVD)   Source  –  scikit-­‐learn.github.io/scikit-­‐learn-­‐tutorial/general_concepts.html  
  • 13. ConsideraEons  on  Data  Gathering   •  Models  will  (generally)  get  beRer  with  more  data   –  But  we  always  have  to  consider  bias  and  variance  as  we  select  our  data   points   –  Also  adversaries  –  we  may  be  force  fed  “bad  data”,  find  signal  in  weird   noise  or  design  bad  (or  exploitable)  features   •  “I’ve  got  99  problems,  but  data  ain’t  one”! Domingos,  2012   Abu-­‐Mostafa,  Caltech,  2012  
  • 14. ApplicaEons  of  Machine  Learning   •  Sales ! •  Trading   •  Image  and   Voice   Recogni4on  
  • 15. Y  U  DO  DIS?   •  Common  reac4ons  from  Security  Professionals:   •  “Eh,  cool…”  *blank  stare*  *walks  away*   •  “Are  you  high,  bro?”   •  “Why  aren’t  you  doing  some  cool  research  like  Android   Malware?”  
  • 17. Security  ApplicaEons  of  ML   •  Fraud  detec4on  systems:   –  Is  what  he  just  did  consistent  with  past   behavior?   •  Network  anomaly  detec4on  (?):   –  More  like  bad  sta4s4cal  analysis   –  Did  not  advance  a  lot,  IMO   •  Predic4ng  likelihood  of  aRack  actors   –  Create  different  predic4ve  models  and   chain  them  to  gain  more  confidence  in  each   step.! •  SPAM  filters  
  • 18. ConsideraEons  on  Data  Gathering   •  Adversaries  -­‐  Exploi4ng  the  learning  process   •  Understand  the  model,  understand  the  machine,  and   you  can  circumvent  it   •  Something  InfoSec  community  knows  very  well   •  Any  predic4ve  model  on  InfoSec  will  be  pushed  to  the   limit   •  Again,  think  back  on  the     way  SPAM  engines  evolved.!
  • 20. CorrelaEon  Rules:  A  Primer   •  Rules  in  a  SIEM  solu4on  invariably  are:   –  “Something”  has  happened  “x”  4mes;   –  “Something”  has  happened  and  other  “something2”  has   happened,  with  some  rela4onship  (4me,  same  fields,  etc)   between  them.   •  Configuring  SIEM  =  iterate  on  combina4ons  un4l:   –  Customer  or  management  is  foole..  I  mean  sa4sfied;     –  Consul4ng  money  runs  out   •  Behavioral  rules  (anomaly  detec4on)  helps  a  bit  with   the  “x”s,  but  s4ll,  very  laborious  and  4me   consuming.!
  • 21. Kinds  of  Network  Security  Monitoring   •  Alert-­‐based:   –  “Tradi4onal”  log  management   –  SIEM   –  Using  “Threat  Intelligence”  (i.e   blacklists)  for  about  a  year  or   so   –  Lack  of  context   –  Low  effec4veness   –  You  get  the  results  handed   over  to  you   •  Explora4on-­‐based:   –  Network  Forensics  tools  (2/3   years  ago)   –  Elas4c  Search  based  LM   systems   –  High  effec4veness   –  Lots  of  people  necessary   –  Lots  of  HIGHLY  trained  people   •  Big  Data  Security  Analy4cs  (BDSA):   –  Run  explora4on-­‐based  monitoring  on  Hadoop   –  More  like  Big  Data  Security  Monitoring  (BDSM)  
  • 23. A  wild  army  of  robots  appears  
  • 24. Using  robots  to  catch  bad  guys  
  • 25. PoC  ||  GTFO   •  We  developed  a  set  of  algorithms  to  detect  malicious   behavior  from  log  entries  of  firewall  blocks   •  Over  6  months  of  data  from  SANS  DShield  (thanks,  guys!)     •  A`er  a  lot  of  sta4s4cal-­‐based  math  (true  posi4ve  ra4o,   true  nega4ve  ra4o,  odds  likelihood),  it  could  pinpoint   actors  that  would  be  13x-­‐18x  more  likely  to  aRack  you.   •  Today  more  like  30x  on  the  SANS  data,  and  finding   around  80%  of  “badness”  in  par4cipant  deployments.!
  • 26. Feature  IntuiEon:  IP  Proximity   •  Assump4ons  to  aggregate  the  data     •  Correla4on  /  proximity  /  similarity  BY  BEHAVIOR   •  “Bad  Neighborhoods”  concept:     –  Spamhaus  x  CyberBunker   –  Google  Report  (June  2013)   –  Moura  2013   •  Group  by  Geoloca4on   •  Group  by  Netblock  (/16,  /24)   •  Group  by  ASN     –  (thanks,  Team  Cymru)!
  • 27. 0   10   MULTICAST  AND  FRIENDS   You  are   here! CN,   BR,   TH   Map  of  the   Internet   •  (Hilbert  Curve)   •  Block  port  22     •  2013-­‐07-­‐20   CN   127   RU  
  • 28. Feature  IntuiEon:  Temporal  Decay   •  Even  bad  neighborhoods  renovate:   –  ARackers  may  change  ISPs/proxies   –  Botnets  may  be  shut  down  /  relocate   –  A  liRle  paranoia  is  Ok,  but  not  EVERYONE  is  out  to  get  you  (at  least   not  all  at  once)! •  As  days  pass,  let's  forget,  bit  by  bit,   who  aRacked   •  Last  4me  I  saw  this  actor,  and  how   o`en  did  I  see  them!
  • 29. MLSec  Project   •  Behavior:  block   on  port  22   •  Trial  inference  on   100k  IP  addresses   per  Class  A   subnet   •  Logarithm    scale:   brightest  4les  are   10  to  1000  4mes   more  likely  to   aRack.  
  • 30. Feature  IntuiEon:  DNS  features   •  Who  resolves  to  this  IP  address?   •  Number  of  domains  that  resolve  to  the  IP  address   •  Distribu4on  of  their  life4me   •  Entropy,  size,  ccTLDs   •  Registrar  informa4on   •  Reverse  DNS  informa4on…   •  History  of  DNS  registra4on…   •  (Thanks,  DNSDB!)  
  • 31. Training  the  Model   •  YAY!  We  have  a  bunch  of  numbers  per  IP  address/domain!   •  How  do  you  define  what  is  malicious  or  not?   •  “Advanced  exper4se  in  both  informa4on  security  and  data   science  will  be  a  necessary  ingredient  in  enabling  accurate   discrimina4on  between  malicious  and  benign  ac4vity.  “          -­‐  Anton  Chuvakin,  Gartner   •  Kinda  easy  for  security  tools  (if  you  trust  them)   •  Web  applica4on  logs  need  deeper  sta4s4cal  analysis   •  Not  normal  /  standard  devia4on  thing     !
  • 32. How  do  I  get  started  on  this?   •  Programming  is  a  must  (Python  /  R)   •  Sta4s4cal  knowledge  keeps  you  from  making  dumb   mistakes   •  Specific  machine  learning  courses  and  books:   –  Coursera  (ML/  Data  Analysis  /  Data  Science)   •  Prac4ce,  Prac4ce,  Prac4ce:   –  Explore  your  data!  –  (Security  Onion)   –  Kaggle   –  KDD,  VAST,  VizSec!
  • 33. MLSec  Project   •  Sign  up,  send  logs,  receive  reports  generated  by  machine   learning  models!   •  Working  with  several  companies  on  trying  out  these  models  on   their  environment  with  their  data   •  We  are  hiring  (KINDA)   •  Visit  h]ps://www.mlsecproject.org  ,  message  @MLSecProject   or  just  e-­‐mail  me.!
  • 34. MLSec  Project  -­‐  Current  Research   •  Inbound  aRacks  on  exposed  services  (DEFCON/BH  2013):   –  Informa4on  from  inbound  connec4ons  on  firewalls,  IPS,  WAFs   –  Feature  extrac4on  and  supervised  learning       •  Malware  Distribu4on  and  Botnets:   –  Informa4on  from  outbound  connec4ons  on  firewalls,  DNS  and   Web  Proxy   –  Ini4al  labeling  provided  by  intelligence  feeds  and  AV/an4-­‐malware   –  Semi-­‐supervised  learning  involved   •  Kill-­‐chain  Ensemble  Models:   –  Increased  precision  by  composing  different  behaviors   –  Web  server  path  -­‐>  go  through  Firewall,  then  IPS,  then  WAF   –  Early  confirma4on  of  aRack  failure  or  success  
  • 35. Thanks!   •  Q&A?   •  Feedback?   Alexandre  Pinto     @alexcpsec   @MLSecProject   hRps://www.mlsecproject.org/   "  Essen4ally,  all  models  are  wrong,  but  some  are  useful."                        -­‐  George  E.  P.  Box