SlideShare a Scribd company logo
1 of 32
Download to read offline
BotMagnifier: Locating Spambots on the Internet




                             Gianluca Stringhini
                               Thorsten Holz
                              Brett Stone-Gross
                             Christopher Kruegel
                               Giovanni Vigna

                            USENIX Security Symposium


                               August 12, 2011
Spam is a big problem
Spam is sneaky
Spam is sneaky
Tracking Spambots is important


Botnets are responsible for 85% of worldwide spam




  ISPs and organizations can clean up their networks
  Existing blacklists (DNSBL) can be improved
  Mitigation efforts can be directed to the most aggressive botnets
Tracking Spambots is challenging

    The IP addresses of infected machines change frequently
    It is easy to recruit “new members” into a botnet
e

An approach is to set up spam traps. However, a few problems arise:


    Only a subset of the bots will send emails to the spam trap
    addresses
    Some botnets target only users located in certain countries
Basic Insight

Bots that belong to the same botnet share similarities


As a result, they will follow a similar behavior when sending spam


Commoditized botnets could appear as multiple botnets


By observing a portion of a botnet, it is possible to identify more bots
that belong to it
Our Approach
Our Approach
Our Approach
Our Approach
Our Approach
Our Approach
Our Approach
Input Datasets


How can we achieve this?



Our approach takes two datasets as input:
  The IP addresses of known spamming bots, grouped by spam
  campaign (seed pools)
  A log of email transactions carried out on the Internet, both
  legitimate and malicious (transaction log)
Our System

We implemented our approach in a tool, called BotMagnifier


We used a large spam trap to populate seed pools


We used the logs of a Spamhaus mirror as transaction log
  Each query to the Spamhaus mirror corresponds to an email
  We show how BotMagnifier also works when using other
  datasets as transaction logs
Our System


BotMagnifier is executed periodically


It takes as input a set of seed pools


At the end of each observation period, it outputs:
  The IP addresses of the bots in the magnified pools
  The name of the botnet that carried out each campaign
Phase I: Building Seed Pools

Set of IP addresses that participated in a specific spam campaign

Built using the data of a spam trap set up by a large US ISP

                       ≈ 1M messages / day


We consider messages with similar subject lines as part of the same
campaign

Design decisions:
  Minimum seed pool size: 1,000 IP addresses
  Observation period: 1 day
Phase II: Characterizing Bot Behavior

For each seed pool:
  We query the transaction log to find all the events that are
  associated with the IP addresses in it
  We analyze the set of destinations targeted and build a target set

Problem
The target sets of two botnets might have substantial overlaps


We extract the set of destinations that are unique to each seed pool
(characterizing set)
Phase III: Bot Magnification

Goal: find the IP addresses of previously-unknown bots


BotMagnifier considers an IP address x as behaving similarly to
the bots in a seed pool if:
  x sent emails to at least N destinations in the target set
  x never sent an email to a destination outside the target set
  x has contacted at least one destination in the characterizing set


How large should N be?
Threshold Computation
N should be greater for campaigns targeting a larger number of
destinations


                      N = k · |T (pi )|, 0 < k ≤ 1

where |T (pi )| is the size of the target set, and k is a parameter

Precision vs. Recall analysis on ten campaigns for which we had
ground truth (coming from Cutwail C&C servers)


               k = kb +      α
                          |T (pi )|   → kb = 8 · 10−4 , α = 10
Phase IV: Spam Attribution
We want to “label” spam campaigns based on the botnet that carried
them out

Running Malware Samples
We match the subject lines observed in the wild with the ones of the
bots we ran


Botnet Clustering
  IP overlap
  Destination distance
  Bot distance
Validation of the Approach
To validate our approach, we studied Cutwail, for which we had direct
data about the IP addresses of the infected machines
The C&C servers we analyzed accounted for approximately 30% of
the botnet
We ran the validation experiment for the period between July 28 and
August 16, 2010
For each of the 18 days:
  We selected a subset of the IP addresses referenced by the C&C
  servers
  With the help of the spam trap, we identified the campaigns
  carried out
  We generated the seed and magnified pools
BotMagnifier identified 144,317 IP addresses as bots. Of these,
33,550 were actually listed in the C&C databases (≈ 23%).
Overview of Tracking Results
We ran our system between September 28, 2010 and February 5, 2011
BotMagnifier tracked 2,031,110 bot IP addresses
Of these, 925,978 belonged to magnified pools, while the others
belonged to seed pools
1.6% estimated false positives


 Botnet     Total # of IP addresses    # of ”static“ IP addresses
  Lethic            887,852                     117,335
 Rustock            676,905                     104,460
 Cutwail            319,355                      34,132
 MegaD               68,117                      3,055
 Waledac             36,058                      3,450
Overview of Tracking Results
Overview of Tracking Results
Overview of Tracking Results
Overview of Tracking Results
Application of Results
Can BotMagnifier improve existing blacklists?

We analyzed the email logs from the UCSB CS mail server from
November 30, 2010 to February 8, 2011
  If a mail got delivered, the IP address was not blacklisted at the
  time
  The spam ratios computed by SpamAssassin provide us with
  ground truth
28,563 emails were marked as spam, 10,284 IP addresses involved.
295 of them were detected by BotMagnifier, for a total of 1,225
emails (≈ 4%)

We then looked for false positives. BotMagnifier wrongly
identified 12 out of 209,013 IP addresses as bots.
Data Stream Independence

We show how BotMagnifier can be used on alternative datasets,
too

We used the netflow logs from an ISP backbone routers
1.9M emails logged per day

We had to use new values for kb and α

The experiment lasted from January 20, 2011 to January 28, 2011.

BotMagnifier identified 36,739 in magnified pools. This grew the
seed pools by 38%.
Conclusions


We presented BotMagnifier, a tool for tracking and analyzing
spamming botnets


We showed that our approach is able to accurately identify and track
botnets


By using more comprehensive datasets, the magnification results
would get better
Thanks!


email: gianluca@cs.ucsb.edu
twitter: @gianlucaSB

More Related Content

Viewers also liked

The Harvester, the Botmaster, and the Spammer: On the Relations Between the D...
The Harvester, the Botmaster, and the Spammer: On the Relations Between the D...The Harvester, the Botmaster, and the Spammer: On the Relations Between the D...
The Harvester, the Botmaster, and the Spammer: On the Relations Between the D...Gianluca Stringhini
 
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web PagesShady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web PagesGianluca Stringhini
 
The Tricks of the Trade: What Makes Spam Campaigns Successful?
The Tricks of the Trade: What Makes Spam Campaigns Successful?The Tricks of the Trade: What Makes Spam Campaigns Successful?
The Tricks of the Trade: What Makes Spam Campaigns Successful?Gianluca Stringhini
 
Follow the Green: Growth and Dynamics on Twitter Follower Markets
Follow the Green: Growth and Dynamics on Twitter Follower MarketsFollow the Green: Growth and Dynamics on Twitter Follower Markets
Follow the Green: Growth and Dynamics on Twitter Follower MarketsGianluca Stringhini
 
EvilCohort: Detecting Communities of Malicious Accounts on Online Services
EvilCohort: Detecting Communities of Malicious Accounts on Online ServicesEvilCohort: Detecting Communities of Malicious Accounts on Online Services
EvilCohort: Detecting Communities of Malicious Accounts on Online ServicesGianluca Stringhini
 
The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming ...
The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming ...The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming ...
The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming ...Gianluca Stringhini
 

Viewers also liked (6)

The Harvester, the Botmaster, and the Spammer: On the Relations Between the D...
The Harvester, the Botmaster, and the Spammer: On the Relations Between the D...The Harvester, the Botmaster, and the Spammer: On the Relations Between the D...
The Harvester, the Botmaster, and the Spammer: On the Relations Between the D...
 
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web PagesShady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages
 
The Tricks of the Trade: What Makes Spam Campaigns Successful?
The Tricks of the Trade: What Makes Spam Campaigns Successful?The Tricks of the Trade: What Makes Spam Campaigns Successful?
The Tricks of the Trade: What Makes Spam Campaigns Successful?
 
Follow the Green: Growth and Dynamics on Twitter Follower Markets
Follow the Green: Growth and Dynamics on Twitter Follower MarketsFollow the Green: Growth and Dynamics on Twitter Follower Markets
Follow the Green: Growth and Dynamics on Twitter Follower Markets
 
EvilCohort: Detecting Communities of Malicious Accounts on Online Services
EvilCohort: Detecting Communities of Malicious Accounts on Online ServicesEvilCohort: Detecting Communities of Malicious Accounts on Online Services
EvilCohort: Detecting Communities of Malicious Accounts on Online Services
 
The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming ...
The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming ...The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming ...
The Spammer, the Botmaster, and the Researcher: On the Arms Race in Spamming ...
 

Similar to BotMagnifier: Locating Spambots on the Internet

Monitoring the Spread of Active Worms in Internet
Monitoring the Spread of Active Worms in InternetMonitoring the Spread of Active Worms in Internet
Monitoring the Spread of Active Worms in InternetIOSR Journals
 
Guarding Against Large-Scale Scrabble In Social Network
Guarding Against Large-Scale Scrabble In Social NetworkGuarding Against Large-Scale Scrabble In Social Network
Guarding Against Large-Scale Scrabble In Social NetworkEditor IJCATR
 
Detecting Spam Zombies by Monitoring Outgoing Messages
Detecting  Spam Zombies  by  Monitoring  Outgoing  MessagesDetecting  Spam Zombies  by  Monitoring  Outgoing  Messages
Detecting Spam Zombies by Monitoring Outgoing Messages Gowtham Chandra
 
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"Jishnu Pradeep
 
Tracing Back The Botmaster
Tracing Back The BotmasterTracing Back The Botmaster
Tracing Back The BotmasterIJERA Editor
 
Detecting Spambot as an Antispam Technique for Web Internet BBS
Detecting Spambot as an Antispam Technique for Web Internet BBSDetecting Spambot as an Antispam Technique for Web Internet BBS
Detecting Spambot as an Antispam Technique for Web Internet BBSijsrd.com
 
Detection of Botnets using Honeypots and P2P Botnets
Detection of Botnets using Honeypots and P2P BotnetsDetection of Botnets using Honeypots and P2P Botnets
Detection of Botnets using Honeypots and P2P BotnetsCSCJournals
 
Botnet detection by Imitation method
Botnet detection  by Imitation methodBotnet detection  by Imitation method
Botnet detection by Imitation methodAcad
 
Synopsis viva presentation
Synopsis viva presentationSynopsis viva presentation
Synopsis viva presentationkirubavenkat
 
A Brief Incursion into Botnet Detection
A Brief Incursion into Botnet DetectionA Brief Incursion into Botnet Detection
A Brief Incursion into Botnet DetectionAnant Narayanan
 
miniproject.ppt.pptx
miniproject.ppt.pptxminiproject.ppt.pptx
miniproject.ppt.pptxAnush90
 
Spams meet cryptocurrencies
Spams meet cryptocurrenciesSpams meet cryptocurrencies
Spams meet cryptocurrenciesMatteo Romiti
 
Lab3code.c#include stdio.h#include stdlib.h#include.docx
Lab3code.c#include stdio.h#include stdlib.h#include.docxLab3code.c#include stdio.h#include stdlib.h#include.docx
Lab3code.c#include stdio.h#include stdlib.h#include.docxsmile790243
 
Towards botnet detection through features using network traffic classification
Towards botnet detection through features using network traffic classificationTowards botnet detection through features using network traffic classification
Towards botnet detection through features using network traffic classificationIJERA Editor
 
Honeypots.ppt1800363876
Honeypots.ppt1800363876Honeypots.ppt1800363876
Honeypots.ppt1800363876Momita Sharma
 

Similar to BotMagnifier: Locating Spambots on the Internet (20)

Monitoring the Spread of Active Worms in Internet
Monitoring the Spread of Active Worms in InternetMonitoring the Spread of Active Worms in Internet
Monitoring the Spread of Active Worms in Internet
 
Guarding Against Large-Scale Scrabble In Social Network
Guarding Against Large-Scale Scrabble In Social NetworkGuarding Against Large-Scale Scrabble In Social Network
Guarding Against Large-Scale Scrabble In Social Network
 
Detecting Spam Zombies by Monitoring Outgoing Messages
Detecting  Spam Zombies  by  Monitoring  Outgoing  MessagesDetecting  Spam Zombies  by  Monitoring  Outgoing  Messages
Detecting Spam Zombies by Monitoring Outgoing Messages
 
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"
Paper Presentation - "Your Botnet is my Botnet : Analysis of a Botnet Takeover"
 
Tracing Back The Botmaster
Tracing Back The BotmasterTracing Back The Botmaster
Tracing Back The Botmaster
 
B0940509
B0940509B0940509
B0940509
 
Detecting Spambot as an Antispam Technique for Web Internet BBS
Detecting Spambot as an Antispam Technique for Web Internet BBSDetecting Spambot as an Antispam Technique for Web Internet BBS
Detecting Spambot as an Antispam Technique for Web Internet BBS
 
Detection of Botnets using Honeypots and P2P Botnets
Detection of Botnets using Honeypots and P2P BotnetsDetection of Botnets using Honeypots and P2P Botnets
Detection of Botnets using Honeypots and P2P Botnets
 
Botnet detection by Imitation method
Botnet detection  by Imitation methodBotnet detection  by Imitation method
Botnet detection by Imitation method
 
Botnets
BotnetsBotnets
Botnets
 
Synopsis viva presentation
Synopsis viva presentationSynopsis viva presentation
Synopsis viva presentation
 
A Brief Incursion into Botnet Detection
A Brief Incursion into Botnet DetectionA Brief Incursion into Botnet Detection
A Brief Incursion into Botnet Detection
 
2 dc meet new
2 dc meet new2 dc meet new
2 dc meet new
 
Botnets
BotnetsBotnets
Botnets
 
miniproject.ppt.pptx
miniproject.ppt.pptxminiproject.ppt.pptx
miniproject.ppt.pptx
 
Spams meet cryptocurrencies
Spams meet cryptocurrenciesSpams meet cryptocurrencies
Spams meet cryptocurrencies
 
Lab3code.c#include stdio.h#include stdlib.h#include.docx
Lab3code.c#include stdio.h#include stdlib.h#include.docxLab3code.c#include stdio.h#include stdlib.h#include.docx
Lab3code.c#include stdio.h#include stdlib.h#include.docx
 
Towards botnet detection through features using network traffic classification
Towards botnet detection through features using network traffic classificationTowards botnet detection through features using network traffic classification
Towards botnet detection through features using network traffic classification
 
Honeypots.ppt1800363876
Honeypots.ppt1800363876Honeypots.ppt1800363876
Honeypots.ppt1800363876
 
Botnets 101
Botnets 101Botnets 101
Botnets 101
 

Recently uploaded

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 

Recently uploaded (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

BotMagnifier: Locating Spambots on the Internet

  • 1. BotMagnifier: Locating Spambots on the Internet Gianluca Stringhini Thorsten Holz Brett Stone-Gross Christopher Kruegel Giovanni Vigna USENIX Security Symposium August 12, 2011
  • 2. Spam is a big problem
  • 5. Tracking Spambots is important Botnets are responsible for 85% of worldwide spam ISPs and organizations can clean up their networks Existing blacklists (DNSBL) can be improved Mitigation efforts can be directed to the most aggressive botnets
  • 6. Tracking Spambots is challenging The IP addresses of infected machines change frequently It is easy to recruit “new members” into a botnet e An approach is to set up spam traps. However, a few problems arise: Only a subset of the bots will send emails to the spam trap addresses Some botnets target only users located in certain countries
  • 7. Basic Insight Bots that belong to the same botnet share similarities As a result, they will follow a similar behavior when sending spam Commoditized botnets could appear as multiple botnets By observing a portion of a botnet, it is possible to identify more bots that belong to it
  • 15. Input Datasets How can we achieve this? Our approach takes two datasets as input: The IP addresses of known spamming bots, grouped by spam campaign (seed pools) A log of email transactions carried out on the Internet, both legitimate and malicious (transaction log)
  • 16. Our System We implemented our approach in a tool, called BotMagnifier We used a large spam trap to populate seed pools We used the logs of a Spamhaus mirror as transaction log Each query to the Spamhaus mirror corresponds to an email We show how BotMagnifier also works when using other datasets as transaction logs
  • 17. Our System BotMagnifier is executed periodically It takes as input a set of seed pools At the end of each observation period, it outputs: The IP addresses of the bots in the magnified pools The name of the botnet that carried out each campaign
  • 18. Phase I: Building Seed Pools Set of IP addresses that participated in a specific spam campaign Built using the data of a spam trap set up by a large US ISP ≈ 1M messages / day We consider messages with similar subject lines as part of the same campaign Design decisions: Minimum seed pool size: 1,000 IP addresses Observation period: 1 day
  • 19. Phase II: Characterizing Bot Behavior For each seed pool: We query the transaction log to find all the events that are associated with the IP addresses in it We analyze the set of destinations targeted and build a target set Problem The target sets of two botnets might have substantial overlaps We extract the set of destinations that are unique to each seed pool (characterizing set)
  • 20. Phase III: Bot Magnification Goal: find the IP addresses of previously-unknown bots BotMagnifier considers an IP address x as behaving similarly to the bots in a seed pool if: x sent emails to at least N destinations in the target set x never sent an email to a destination outside the target set x has contacted at least one destination in the characterizing set How large should N be?
  • 21. Threshold Computation N should be greater for campaigns targeting a larger number of destinations N = k · |T (pi )|, 0 < k ≤ 1 where |T (pi )| is the size of the target set, and k is a parameter Precision vs. Recall analysis on ten campaigns for which we had ground truth (coming from Cutwail C&C servers) k = kb + α |T (pi )| → kb = 8 · 10−4 , α = 10
  • 22. Phase IV: Spam Attribution We want to “label” spam campaigns based on the botnet that carried them out Running Malware Samples We match the subject lines observed in the wild with the ones of the bots we ran Botnet Clustering IP overlap Destination distance Bot distance
  • 23. Validation of the Approach To validate our approach, we studied Cutwail, for which we had direct data about the IP addresses of the infected machines The C&C servers we analyzed accounted for approximately 30% of the botnet We ran the validation experiment for the period between July 28 and August 16, 2010 For each of the 18 days: We selected a subset of the IP addresses referenced by the C&C servers With the help of the spam trap, we identified the campaigns carried out We generated the seed and magnified pools BotMagnifier identified 144,317 IP addresses as bots. Of these, 33,550 were actually listed in the C&C databases (≈ 23%).
  • 24. Overview of Tracking Results We ran our system between September 28, 2010 and February 5, 2011 BotMagnifier tracked 2,031,110 bot IP addresses Of these, 925,978 belonged to magnified pools, while the others belonged to seed pools 1.6% estimated false positives Botnet Total # of IP addresses # of ”static“ IP addresses Lethic 887,852 117,335 Rustock 676,905 104,460 Cutwail 319,355 34,132 MegaD 68,117 3,055 Waledac 36,058 3,450
  • 29. Application of Results Can BotMagnifier improve existing blacklists? We analyzed the email logs from the UCSB CS mail server from November 30, 2010 to February 8, 2011 If a mail got delivered, the IP address was not blacklisted at the time The spam ratios computed by SpamAssassin provide us with ground truth 28,563 emails were marked as spam, 10,284 IP addresses involved. 295 of them were detected by BotMagnifier, for a total of 1,225 emails (≈ 4%) We then looked for false positives. BotMagnifier wrongly identified 12 out of 209,013 IP addresses as bots.
  • 30. Data Stream Independence We show how BotMagnifier can be used on alternative datasets, too We used the netflow logs from an ISP backbone routers 1.9M emails logged per day We had to use new values for kb and α The experiment lasted from January 20, 2011 to January 28, 2011. BotMagnifier identified 36,739 in magnified pools. This grew the seed pools by 38%.
  • 31. Conclusions We presented BotMagnifier, a tool for tracking and analyzing spamming botnets We showed that our approach is able to accurately identify and track botnets By using more comprehensive datasets, the magnification results would get better